Step-by-Step Guide to Migrating SQL Relational Data to MongoDB
Learn how to migrate your SQL relational data to MongoDB with this comprehensive step-by-step guide. Discover best practices for transforming traditional schemas into efficient MongoDB document structures, including essential planning, schema design strategies like embedding and referencing, data extraction, transformation techniques, and loading into MongoDB. This tutorial provides practical examples and actionable advice for a smooth and successful transition to a NoSQL database.
Step-by-Step Guide to Migrating SQL Relational Data to MongoDB
Migrating SQL relational data to MongoDB is not a table copy. The hard part is deciding which relationships should become embedded documents, which should stay referenced, and how your application will query the new shape.
Relational schemas are often normalized across tables. MongoDB's document model often works best when related data that is read together is stored together. This guide walks through planning, transformation, loading, verification, and application changes without assuming every SQL table should become one MongoDB collection.
Understanding the Core Differences: Relational vs. Document Models
Before diving into the migration process, it's essential to grasp the conceptual differences:
- Relational Model: Data is stored in tables with predefined schemas. Relationships are managed through foreign keys, requiring JOIN operations to retrieve related data. Normalization is a key principle.
- Document Model (MongoDB): Data is stored in flexible, JSON-like documents. Documents can have varying structures. Related data can be embedded within a single document (denormalization) or referenced using application-level joins or MongoDB's
$lookupaggregation stage.
This difference in data modeling directly impacts how you design your MongoDB collections and documents.
Phase 1: Planning and Schema Design
This is the most critical phase. A well-designed MongoDB schema is key to leveraging its benefits. The goal is to model your data based on application access patterns, not just a direct translation of your SQL tables.
1. Analyze Your Application's Access Patterns
- Identify read-heavy vs. write-heavy operations: How frequently is data read, and how is it typically queried? What fields are most commonly retrieved together?
- Determine common query paths: What are the most frequent
SELECTstatements in your SQL application? Which tables are usually joined? - Understand data relationships: How are entities related? Are these one-to-one, one-to-many, or many-to-many relationships?
2. Choose Your Denormalization Strategy
MongoDB's power lies in its ability to embed related data. Consider these strategies:
- Embedding (Denormalization): The most common approach. Embed documents or arrays of documents within a parent document when the relationship is one-to-many or when data is frequently accessed together. This reduces the need for joins.
- Example: Instead of having separate
ordersandorder_itemstables, you can embedorder_itemsas an array within theorderdocument.
- Example: Instead of having separate
- Referencing: Use when embedding would lead to excessively large documents, or when data is accessed independently. Store the
_idof a related document, similar to foreign keys, and perform application-level joins or use MongoDB's$lookup.- Example: A
userscollection and apostscollection. A post might store theuser_idof its author. You can then use$lookupto retrieve the author's details when fetching a post.
- Example: A
3. Design Your MongoDB Collections and Documents
Based on your access patterns and denormalization strategy, design your collections. A good starting point is to map SQL tables to MongoDB collections. Then, decide which related data should be embedded and which should be referenced.
SQL Schema Example:
-- Customers Table
CREATE TABLE Customers (
CustomerID INT PRIMARY KEY,
FirstName VARCHAR(50),
LastName VARCHAR(50),
Email VARCHAR(100)
);
-- Orders Table
CREATE TABLE Orders (
OrderID INT PRIMARY KEY,
CustomerID INT,
OrderDate DATE,
TotalAmount DECIMAL(10, 2),
FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID)
);
-- OrderItems Table
CREATE TABLE OrderItems (
OrderItemID INT PRIMARY KEY,
OrderID INT,
ProductID INT,
Quantity INT,
Price DECIMAL(10, 2),
FOREIGN KEY (OrderID) REFERENCES Orders(OrderID)
);
MongoDB Document Design Options:
Option A: Customer with embedded orders, if customers have a manageable number of orders and orders are frequently viewed with the customer:
{ "_id": ObjectId("..."), "customer_id": 1, "first_name": "John", "last_name": "Doe", "email": "[email protected]", "orders": [ { "order_id": 101, "order_date": ISODate("2023-10-26T00:00:00Z"), "total_amount": 50.00, "items": [ { "product_id": 1, "quantity": 2, "price": 25.00 }, { "product_id": 3, "quantity": 1, "price": 0.00 } ] } ] }Option B: Separate collections with referencing, if orders are numerous or often queried independently.
Customers collection:
{ "_id": ObjectId("..."), "customer_id": 1, "first_name": "John", "last_name": "Doe", "email": "[email protected]" }Orders collection:
{ "_id": ObjectId("..."), "order_id": 101, "customer_id": 1, "order_date": ISODate("2023-10-26T00:00:00Z"), "total_amount": 50.00, "items": [ { "product_id": 1, "quantity": 2, "price": 25.00 }, { "product_id": 3, "quantity": 1, "price": 0.00 } ] }
Considerations for Document Size: MongoDB has a document size limit (16MB). Avoid embedding excessively large arrays that could exceed this limit. If an array grows indefinitely, consider breaking it out into a separate collection.
Phase 2: Data Extraction and Transformation
Once your target schema is designed, you need to extract data from your SQL database and transform it into the new document format.
1. Extract Data from SQL
Use standard SQL queries to select the data you need. You can export this data into formats like CSV or JSON.
- Using SQL clients: Most SQL database tools (e.g., DBeaver, SQL Developer, pgAdmin) allow you to export query results to CSV or JSON.
- Scripting: Write scripts (Python, Node.js, etc.) to connect to your SQL database, execute queries, and fetch data.
2. Transform Data
This is where you'll implement your designed schema. You'll need to write code or use a tool to:
- Group related records: For example, gather all
OrderItemsbelonging to a specificOrder. - Restructure data: Convert relational rows into nested JSON documents.
- Handle data types: Ensure data types are compatible with MongoDB (e.g., dates, numbers, strings).
Example using Python:
Let's assume you've exported Customers, Orders, and OrderItems to CSV files.
import pandas as pd
import json
from bson import ObjectId
# Load data from CSV files (assuming they are in the same directory)
customers_df = pd.read_csv('customers.csv')
orders_df = pd.read_csv('orders.csv')
order_items_df = pd.read_csv('order_items.csv')
# --- Data Transformation Logic ---
# Convert DataFrame to dictionaries for easier manipulation
customers_list = customers_df.to_dict('records')
orders_list = orders_df.to_dict('records')
order_items_list = order_items_df.to_dict('records')
# Create a mapping for orders and order_items for quick lookup
orders_by_customer = {}
for order in orders_list:
customer_id = order['CustomerID']
if customer_id not in orders_by_customer:
orders_by_customer[customer_id] = []
orders_by_customer[customer_id].append(order)
order_items_by_order = {}
for item in order_items_list:
order_id = item['OrderID']
if order_id not in order_items_by_order:
order_items_by_order[order_id] = []
order_items_by_order[order_id].append(item)
# --- Building MongoDB Documents (Option A: Customer with Embedded Orders) ---
mongo_documents = []
for customer in customers_list:
mongo_doc = {
"_id": ObjectId(), # MongoDB generates _id automatically, but you can map if needed
"customer_id": customer['CustomerID'],
"first_name": customer['FirstName'],
"last_name": customer['LastName'],
"email": customer['Email'],
"orders": []
}
customer_id = customer['CustomerID']
if customer_id in orders_by_customer:
for order in orders_by_customer[customer_id]:
order_doc = {
"order_id": order['OrderID'],
"order_date": order['OrderDate'], # Ensure correct date format
"total_amount": order['TotalAmount'],
"items": []
}
order_id = order['OrderID']
if order_id in order_items_by_order:
for item in order_items_by_order[order_id]:
order_doc['items'].append({
"product_id": item['ProductID'],
"quantity": item['Quantity'],
"price": item['Price']
})
mongo_doc['orders'].append(order_doc)
mongo_documents.append(mongo_doc)
# Now 'mongo_documents' is a list of dictionaries ready to be inserted into MongoDB
# print(json.dumps(mongo_documents[0], indent=2, default=str)) # Print first document as JSON
# For Option B (Separate Collections), you would create lists for each collection:
# customers_mongo = [{'customer_id': c['CustomerID'], ...} for c in customers_list]
# orders_mongo = [{'order_id': o['OrderID'], 'customer_id': o['CustomerID'], ...} for o in orders_list]
# Save to JSON for import (optional)
# with open('mongo_customer_data.json', 'w') as f:
# json.dump(mongo_documents, f, indent=2, default=str)
3. Tools for Transformation
- Custom Scripts: Python with Pandas, Node.js with libraries like
csv-parserandmysql/pgare powerful for complex transformations. - ETL Tools: Tools like Apache NiFi, Talend, or AWS Glue can orchestrate complex data pipelines, including SQL to MongoDB migration.
- Database migration platforms: Some commercial ETL and CDC tools can sync relational sources into MongoDB. Check connector support for your exact SQL database and MongoDB target before you plan around a tool.
Phase 3: Data Loading into MongoDB
Once your data is transformed, you can load it into your MongoDB instance.
1. Connect to MongoDB
Use the MongoDB Shell (mongosh) or a MongoDB driver (for your programming language) to connect to your database.
2. Import Transformed Data
Using
mongoimport: If you exported your transformed data into a JSON file, you can usemongoimport:# Assuming your data is in mongo_customer_data.json and you want to import into 'customers' collection mongoimport --db your_database_name --collection customers --file mongo_customer_data.json --jsonArray--jsonArray: Use this flag if your JSON file contains an array of documents.
Using MongoDB Drivers: If you generated your data structures in your programming language (like the
mongo_documentslist in Python), you can insert them directly:Python example using
pymongo:from pymongo import MongoClient # Assuming 'mongo_documents' list is defined from previous Python script client = MongoClient('mongodb://localhost:27017/') db = client['your_database_name'] customers_collection = db['customers'] # Insert the transformed documents if mongo_documents: insert_result = customers_collection.insert_many(mongo_documents) print(f"Inserted {len(insert_result.inserted_ids)} documents.") else: print("No documents to insert.") client.close()
3. Verify Data Integrity
After loading, run queries in MongoDB to verify that the data has been imported correctly and matches your expectations.
// Example: Count documents in the 'customers' collection
use your_database_name;
print(db.customers.countDocuments());
// Example: Find a specific customer and check their embedded orders
db.customers.findOne({ "customer_id": 1 })
Phase 4: Application Refactoring
This is arguably the most time-consuming phase. Your application code needs to be updated to interact with MongoDB instead of SQL.
- Update Database Connections: Change connection strings and libraries.
- Rewrite Queries: Replace SQL queries with MongoDB query language using your chosen driver's API.
- Adjust Data Access Layer: Modify your ORM or data access layer to work with MongoDB documents.
- Leverage MongoDB Features: Adapt your application to take advantage of features like flexible schemas, aggregation framework, and geospatial queries if applicable.
Best Practices and Tips
- Start Small: If possible, migrate a subset of your data or a less critical application first to gain experience.
- Iterate on Schema Design: Your initial MongoDB schema might not be perfect. Be prepared to iterate and refine it based on performance testing and application feedback.
- Index Wisely: Just like in SQL, indexing is crucial for performance in MongoDB. Identify your query patterns and create appropriate indexes.
- Monitor Performance: Continuously monitor your MongoDB deployment for performance bottlenecks and optimize queries and schema as needed.
- Consider Incremental Migration: For large databases, consider an incremental migration strategy where you sync changes from SQL to MongoDB in near real-time before performing a final cutover.
Takeaway
The safest SQL-to-MongoDB migration starts with access patterns, not table names. Model one important workflow, transform a small data slice, load it into MongoDB, verify counts and sample documents, then update the application code around that shape before scaling the migration.