Step-by-Step Guide to Migrating SQL Relational Data to MongoDB

Migrating SQL relational data to MongoDB is not a table copy. The hard part is deciding which relationships should become embedded documents, which should stay referenced, and how your application will query the new shape.

Relational schemas are often normalized across tables. MongoDB's document model often works best when related data that is read together is stored together. This guide walks through planning, transformation, loading, verification, and application changes without assuming every SQL table should become one MongoDB collection.

Understanding the Core Differences: Relational vs. Document Models

Before diving into the migration process, it's essential to grasp the conceptual differences:

Relational Model: Data is stored in tables with predefined schemas. Relationships are managed through foreign keys, requiring JOIN operations to retrieve related data. Normalization is a key principle.
Document Model (MongoDB): Data is stored in flexible, JSON-like documents. Documents can have varying structures. Related data can be embedded within a single document (denormalization) or referenced using application-level joins or MongoDB's $lookup aggregation stage.

This difference in data modeling directly impacts how you design your MongoDB collections and documents.

Phase 1: Planning and Schema Design

This is the most critical phase. A well-designed MongoDB schema is key to leveraging its benefits. The goal is to model your data based on application access patterns, not just a direct translation of your SQL tables.

1. Analyze Your Application's Access Patterns

Identify read-heavy vs. write-heavy operations: How frequently is data read, and how is it typically queried? What fields are most commonly retrieved together?
Determine common query paths: What are the most frequent SELECT statements in your SQL application? Which tables are usually joined?
Understand data relationships: How are entities related? Are these one-to-one, one-to-many, or many-to-many relationships?

2. Choose Your Denormalization Strategy

MongoDB's power lies in its ability to embed related data. Consider these strategies:

Embedding (Denormalization): The most common approach. Embed documents or arrays of documents within a parent document when the relationship is one-to-many or when data is frequently accessed together. This reduces the need for joins.
- Example: Instead of having separate orders and order_items tables, you can embed order_items as an array within the order document.
Referencing: Use when embedding would lead to excessively large documents, or when data is accessed independently. Store the _id of a related document, similar to foreign keys, and perform application-level joins or use MongoDB's $lookup.
- Example: A users collection and a posts collection. A post might store the user_id of its author. You can then use $lookup to retrieve the author's details when fetching a post.

3. Design Your MongoDB Collections and Documents

Based on your access patterns and denormalization strategy, design your collections. A good starting point is to map SQL tables to MongoDB collections. Then, decide which related data should be embedded and which should be referenced.

SQL Schema Example:

-- Customers Table
CREATE TABLE Customers (
    CustomerID INT PRIMARY KEY,
    FirstName VARCHAR(50),
    LastName VARCHAR(50),
    Email VARCHAR(100)
);

-- Orders Table
CREATE TABLE Orders (
    OrderID INT PRIMARY KEY,
    CustomerID INT,
    OrderDate DATE,
    TotalAmount DECIMAL(10, 2),
    FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID)
);

-- OrderItems Table
CREATE TABLE OrderItems (
    OrderItemID INT PRIMARY KEY,
    OrderID INT,
    ProductID INT,
    Quantity INT,
    Price DECIMAL(10, 2),
    FOREIGN KEY (OrderID) REFERENCES Orders(OrderID)
);

MongoDB Document Design Options:

Option A: Customer with embedded orders, if customers have a manageable number of orders and orders are frequently viewed with the customer:

{
  "_id": ObjectId("..."),
  "customer_id": 1,
  "first_name": "John",
  "last_name": "Doe",
  "email": "[email protected]",
  "orders": [
    {
      "order_id": 101,
      "order_date": ISODate("2023-10-26T00:00:00Z"),
      "total_amount": 50.00,
      "items": [
        { "product_id": 1, "quantity": 2, "price": 25.00 },
        { "product_id": 3, "quantity": 1, "price": 0.00 }
      ]
    }
  ]
}

Option B: Separate collections with referencing, if orders are numerous or often queried independently.

Customers collection:

{
  "_id": ObjectId("..."),
  "customer_id": 1,
  "first_name": "John",
  "last_name": "Doe",
  "email": "[email protected]"
}

Orders collection:

{
  "_id": ObjectId("..."),
  "order_id": 101,
  "customer_id": 1,
  "order_date": ISODate("2023-10-26T00:00:00Z"),
  "total_amount": 50.00,
  "items": [
    { "product_id": 1, "quantity": 2, "price": 25.00 },
    { "product_id": 3, "quantity": 1, "price": 0.00 }
  ]
}

Considerations for Document Size: MongoDB has a document size limit (16MB). Avoid embedding excessively large arrays that could exceed this limit. If an array grows indefinitely, consider breaking it out into a separate collection.

Phase 2: Data Extraction and Transformation

Once your target schema is designed, you need to extract data from your SQL database and transform it into the new document format.

1. Extract Data from SQL

Use standard SQL queries to select the data you need. You can export this data into formats like CSV or JSON.

Using SQL clients: Most SQL database tools (e.g., DBeaver, SQL Developer, pgAdmin) allow you to export query results to CSV or JSON.
Scripting: Write scripts (Python, Node.js, etc.) to connect to your SQL database, execute queries, and fetch data.

2. Transform Data

This is where you'll implement your designed schema. You'll need to write code or use a tool to:

Group related records: For example, gather all OrderItems belonging to a specific Order.
Restructure data: Convert relational rows into nested JSON documents.
Handle data types: Ensure data types are compatible with MongoDB (e.g., dates, numbers, strings).

Example using Python:

Let's assume you've exported Customers, Orders, and OrderItems to CSV files.

import pandas as pd
import json
from bson import ObjectId

# Load data from CSV files (assuming they are in the same directory)
customers_df = pd.read_csv('customers.csv')
orders_df = pd.read_csv('orders.csv')
order_items_df = pd.read_csv('order_items.csv')

# --- Data Transformation Logic ---

# Convert DataFrame to dictionaries for easier manipulation
customers_list = customers_df.to_dict('records')
orders_list = orders_df.to_dict('records')
order_items_list = order_items_df.to_dict('records')

# Create a mapping for orders and order_items for quick lookup
orders_by_customer = {}
for order in orders_list:
    customer_id = order['CustomerID']
    if customer_id not in orders_by_customer:
        orders_by_customer[customer_id] = []
    orders_by_customer[customer_id].append(order)

order_items_by_order = {}
for item in order_items_list:
    order_id = item['OrderID']
    if order_id not in order_items_by_order:
        order_items_by_order[order_id] = []
    order_items_by_order[order_id].append(item)

# --- Building MongoDB Documents (Option A: Customer with Embedded Orders) ---
mongo_documents = []

for customer in customers_list:
    mongo_doc = {
        "_id": ObjectId(), # MongoDB generates _id automatically, but you can map if needed
        "customer_id": customer['CustomerID'],
        "first_name": customer['FirstName'],
        "last_name": customer['LastName'],
        "email": customer['Email'],
        "orders": []
    }

    customer_id = customer['CustomerID']
    if customer_id in orders_by_customer:
        for order in orders_by_customer[customer_id]:
            order_doc = {
                "order_id": order['OrderID'],
                "order_date": order['OrderDate'], # Ensure correct date format
                "total_amount": order['TotalAmount'],
                "items": []
            }

            order_id = order['OrderID']
            if order_id in order_items_by_order:
                for item in order_items_by_order[order_id]:
                    order_doc['items'].append({
                        "product_id": item['ProductID'],
                        "quantity": item['Quantity'],
                        "price": item['Price']
                    })
            mongo_doc['orders'].append(order_doc)

    mongo_documents.append(mongo_doc)

# Now 'mongo_documents' is a list of dictionaries ready to be inserted into MongoDB
# print(json.dumps(mongo_documents[0], indent=2, default=str)) # Print first document as JSON

# For Option B (Separate Collections), you would create lists for each collection:
# customers_mongo = [{'customer_id': c['CustomerID'], ...} for c in customers_list]
# orders_mongo = [{'order_id': o['OrderID'], 'customer_id': o['CustomerID'], ...} for o in orders_list]

# Save to JSON for import (optional)
# with open('mongo_customer_data.json', 'w') as f:
#     json.dump(mongo_documents, f, indent=2, default=str)

3. Tools for Transformation

Custom Scripts: Python with Pandas, Node.js with libraries like csv-parser and mysql/pg are powerful for complex transformations.
ETL Tools: Tools like Apache NiFi, Talend, or AWS Glue can orchestrate complex data pipelines, including SQL to MongoDB migration.
Database migration platforms: Some commercial ETL and CDC tools can sync relational sources into MongoDB. Check connector support for your exact SQL database and MongoDB target before you plan around a tool.

Phase 3: Data Loading into MongoDB

Once your data is transformed, you can load it into your MongoDB instance.

1. Connect to MongoDB

Use the MongoDB Shell (mongosh) or a MongoDB driver (for your programming language) to connect to your database.

2. Import Transformed Data

Using mongoimport: If you exported your transformed data into a JSON file, you can use mongoimport:

# Assuming your data is in mongo_customer_data.json and you want to import into 'customers' collection
mongoimport --db your_database_name --collection customers --file mongo_customer_data.json --jsonArray

--jsonArray: Use this flag if your JSON file contains an array of documents.

Using MongoDB Drivers: If you generated your data structures in your programming language (like the mongo_documents list in Python), you can insert them directly:

Python example using pymongo:

from pymongo import MongoClient

# Assuming 'mongo_documents' list is defined from previous Python script
client = MongoClient('mongodb://localhost:27017/')
db = client['your_database_name']
customers_collection = db['customers']

# Insert the transformed documents
if mongo_documents:
    insert_result = customers_collection.insert_many(mongo_documents)
    print(f"Inserted {len(insert_result.inserted_ids)} documents.")
else:
    print("No documents to insert.")

client.close()

3. Verify Data Integrity

After loading, run queries in MongoDB to verify that the data has been imported correctly and matches your expectations.

// Example: Count documents in the 'customers' collection
use your_database_name;
print(db.customers.countDocuments());

// Example: Find a specific customer and check their embedded orders
db.customers.findOne({ "customer_id": 1 })

Phase 4: Application Refactoring

This is arguably the most time-consuming phase. Your application code needs to be updated to interact with MongoDB instead of SQL.

Update Database Connections: Change connection strings and libraries.
Rewrite Queries: Replace SQL queries with MongoDB query language using your chosen driver's API.
Adjust Data Access Layer: Modify your ORM or data access layer to work with MongoDB documents.
Leverage MongoDB Features: Adapt your application to take advantage of features like flexible schemas, aggregation framework, and geospatial queries if applicable.

Best Practices and Tips

Start Small: If possible, migrate a subset of your data or a less critical application first to gain experience.
Iterate on Schema Design: Your initial MongoDB schema might not be perfect. Be prepared to iterate and refine it based on performance testing and application feedback.
Index Wisely: Just like in SQL, indexing is crucial for performance in MongoDB. Identify your query patterns and create appropriate indexes.
Monitor Performance: Continuously monitor your MongoDB deployment for performance bottlenecks and optimize queries and schema as needed.
Consider Incremental Migration: For large databases, consider an incremental migration strategy where you sync changes from SQL to MongoDB in near real-time before performing a final cutover.

Takeaway

The safest SQL-to-MongoDB migration starts with access patterns, not table names. Model one important workflow, transform a small data slice, load it into MongoDB, verify counts and sample documents, then update the application code around that shape before scaling the migration.