Best Practices: Avoiding Common MongoDB Performance Pitfalls
Avoid MongoDB performance pitfalls with focused schemas, useful indexes, projections, keyset pagination, and query monitoring.
Best Practices: Avoiding Common MongoDB Performance Pitfalls
MongoDB performance pitfalls usually start small: one unbounded array, one missing compound index, or one dashboard query that scans far more documents than expected. As your data grows, those choices can turn into slow pages, high CPU, and painful maintenance windows.
Use this review as a checklist for schema design, indexing, query shape, and operational habits.
1. Schema Design: The Foundation of Performance
Performance tuning begins long before the first query is written. How you structure your data directly impacts read and write efficiency.
Limiting Document Size and Preventing Bloat
MongoDB documents have a 16 MB BSON document size limit. You should usually stay far below that for hot operational data. Very large documents consume more memory, require more network bandwidth, and make updates more expensive.
Best Practice: Keep Documents Focused
Design documents to contain only the most essential, frequently accessed data. Use referencing for large arrays or related entities that are rarely needed alongside the parent document.
Pitfall: Storing massive historical logs or large binary files (like high-resolution images) directly within operational documents.
The Embedding vs. Referencing Trade-off
Deciding between embedding (storing related data inside the primary document) and referencing (using links via _id and $lookup) is key to optimizing read performance.
| Strategy | Best Use Case | Performance Impact |
|---|---|---|
| Embedding | Small, frequently accessed, and tightly coupled data (e.g., product reviews, address details). | Fast Reads: Fewer queries/network trips required. |
| Referencing | Large, infrequently accessed, or rapidly changing data (e.g., large arrays, shared data). | Slower Reads: Requires $lookup (join equivalent), but prevents document bloat and allows easier updates to referenced data. |
Warning: Array Growth
If an array within an embedded document can grow indefinitely, such as a list of all user actions, reference those actions from a separate collection instead. Unbounded arrays make documents larger, slow updates, and can eventually hit the document size limit.
2. Indexing Strategies: Eliminating Collection Scans
Indexes are the single most critical factor in MongoDB performance. A Collection Scan (COLLSCAN) occurs when MongoDB has to read every document in a collection to satisfy a query, which is usually slow on large datasets.
Proactive Index Creation and Verification
Ensure that an index exists for every field used in a query's filter clause, its sort clause, or its projection (for covered queries).
Use the explain('executionStats') method to verify that indexes are being used and to identify collection scans.
// Check if this query uses an index
db.users.find({ status: "active", created_at: { $gt: ISODate("2023-01-01") } })
.sort({ created_at: -1 })
.explain('executionStats');
The ESR Rule for Compound Indexes
Compound indexes (indexes built on multiple fields) must be ordered correctly to be maximally effective. Use the ESR Rule:
- Equality: Fields used for exact matches come first.
- Sort: Fields used for sorting usually come next.
- Range: Fields used for range operators such as
$gtand$ltusually come last.
Example of the ESR Rule:
Query: Find products by category (equality), sorted by price (sort), within a rating range (range).
// Correct Index Structure based on ESR
db.products.createIndex({ category: 1, price: 1, rating: 1 })
Covered Queries
A Covered Query is one where the entire result set—including the query filter and the fields requested in the projection—can be fulfilled entirely by the index. This means MongoDB doesn't have to retrieve the actual documents, dramatically reducing I/O and boosting speed.
To achieve a covered query, every field returned must be part of the index. The _id field is implicitly included unless explicitly excluded (_id: 0).
// Index must include all requested fields (name, email)
db.users.createIndex({ name: 1, email: 1 });
// Covered Query - only returns fields included in the index
db.users.find({ name: 'Alice' }, { email: 1, _id: 0 });
3. Query Optimization and Retrieval Efficiency
Even with perfect indexing, inefficient query patterns can still severely degrade performance.
Always Use Projection
Projection limits the amount of data transferred over the network and memory consumed by the query executor. Never select all fields ({}) if you only need a subset of data.
// Pitfall: Retrieving the entire large user document
db.users.findOne({ email: '[email protected]' });
// Best Practice: Only retrieve necessary fields
db.users.findOne({ email: '[email protected]' }, { username: 1, last_login: 1 });
Avoiding Large $skip Operations (Keyset Pagination)
Using $skip for deep pagination is highly inefficient because MongoDB still has to scan and discard the skipped documents. When dealing with large result sets, use keyset pagination (also known as cursor-based or offset-free pagination).
Instead of skipping a page number, filter based on the last retrieved indexed value (e.g., _id or timestamp).
// Pitfall: Slows down exponentially as page increases
db.logs.find().sort({ timestamp: -1 }).skip(50000).limit(50);
// Best Practice: Efficiently continues from the last _id
const lastId = '...id_from_previous_page...';
db.logs.find({ _id: { $gt: lastId } }).sort({ _id: 1 }).limit(50);
4. Advanced Pitfalls in Operations and Aggregation
Complex operations like writes and data transformations require specialized optimization techniques.
Optimizing Aggregation Pipelines
Aggregation pipelines are powerful but can be resource-intensive. The key performance rule is to reduce the dataset size as early as possible.
Best Practice: Push $match and $limit Upfront
Place the $match stage (which filters documents) and the $limit stage (which restricts the number of documents processed) at the very beginning of the pipeline. This ensures that subsequent, more expensive stages like $group, $sort, or $project operate on the smallest possible dataset.
// Efficient Pipeline Example
[
{ $match: { status: 'COMPLETE', date: { $gte: '2023-01-01' } } }, // Filter early (use index)
{ $group: { _id: '$customer_id', total_spent: { $sum: '$amount' } } },
{ $sort: { total_spent: -1 } }
]
Managing Write Concerns
Write concern dictates the level of acknowledgement MongoDB provides for a write operation. Choosing an overly strict write concern when high durability isn't strictly necessary can severely impact write latency.
| Write Concern Setting | Latency | Durability |
|---|---|---|
w: 1 |
Low | Confirmed by primary node only. |
w: 'majority' |
High | Confirmed by the majority of replica set members. Maximum durability. |
Tip: For high-throughput, non-critical operations (like analytics or logging), consider using a lower write concern like w: 1 to prioritize speed. For financial transactions or critical data, always use w: majority.
5. Deployment and Configuration Best Practices
Beyond database schema and queries, configuration details impact overall system health.
Monitor Slow Queries
Regularly check the slow query log or use the $currentOp aggregation pipeline to identify operations taking excessive time. MongoDB Profiler is an essential tool for this task.
Manage Connection Pooling
Ensure your application uses an effective connection pool. Creating and destroying database connections is expensive. A well-sized pool reduces latency and overhead. Set minimum and maximum connection pool sizes appropriate for your application traffic patterns.
Use Time-to-Live (TTL) Indexes
For collections containing transient data (e.g., sessions, log entries, cached data), implement TTL Indexes. This allows MongoDB to automatically expire documents after a defined period, preventing collections from growing uncontrollably and degrading indexing efficiency over time.
// Documents in the session collection will expire 3600 seconds after creation
db.session.createIndex({ created_at: 1 }, { expireAfterSeconds: 3600 })
Keep Checking the Actual Query Plans
Avoiding MongoDB performance pitfalls is mostly about staying honest with the query planner. Keep documents focused, create compound indexes for real query patterns, use projections, avoid deep $skip, and check explain('executionStats') whenever a query becomes important to the application. As traffic changes, revisit the plans instead of assuming yesterday's index is still the right one.