Best Practices for Managing and Reducing MongoDB Disk Space Usage
Optimize your MongoDB disk usage with this comprehensive guide to best practices. Learn effective strategies for compacting collections and indexes, identifying and dropping unnecessary indexes, and leveraging WiredTiger's compression features. Discover how to implement data archiving, manage oplog sizing, and proactively monitor disk space to prevent system outages and improve performance. This article provides actionable insights and practical examples to keep your MongoDB deployments lean and efficient.
Best Practices for Managing and Reducing MongoDB Disk Space Usage
MongoDB disk usage usually becomes urgent at the worst possible time: a batch job runs longer than expected, deletes do not seem to free space, or a replica set member starts warning that the volume is almost full. The fix is rarely one magic command. You need to know whether the space is live data, indexes, reusable WiredTiger space, oplog, logs, or local backups.
The safest approach is to measure first, reduce what no longer needs to exist, and only then run heavier maintenance such as compaction or member rebuilds. That order keeps you from creating a long maintenance event that gives back little space.
Understanding MongoDB Disk Space Consumption
MongoDB utilizes disk space for several components:
- Data Files: Stores the actual BSON documents within collections.
- Index Files: Stores B-tree indexes created to support efficient query execution.
- Journal Files (WiredTiger): Records write operations before they are applied to data files, ensuring data durability. These are pre-allocated.
- Oplog (Operational Log): A special capped collection in replica sets that records all write operations. Essential for replication.
- Diagnostic Data: Logs,
mongodprocess files, and other system-related information.
Over time, due to updates, deletions, and document growth (padding), collections and indexes can become fragmented or contain unused allocated space, leading to inefficient disk usage. This "white space" isn't immediately reclaimed by the operating system, even if the database no longer needs it for live data.
Strategies for Reducing MongoDB Disk Space
1. Compacting Collections and Indexes
Compaction operations help reclaim unused disk space by rewriting data and index files more efficiently. This can be particularly useful after significant data deletions or updates.
Compacting Collections
With the WiredTiger storage engine (default since MongoDB 3.2), compact primarily reclaims free space from deleted documents and defragments collections. It does not rebuild the collection's data file from scratch like MMAPv1's compact operation did.
db.runCommand({ compact: "myCollection" })
Considerations for compact:
compactoperations can be resource-intensive (CPU, I/O) and take a significant amount of time, especially for large collections. It's often best run during maintenance windows or on secondary members of a replica set.- Disk requirements and locking behavior vary by MongoDB version, storage engine, and deployment shape. Check the documentation for your exact version before running it on a large production collection.
- For sharded clusters, run
compacton each shard independently.
Rebuilding Indexes
Indexes can also become fragmented. Rebuilding an index can reclaim space and potentially improve query performance.
db.myCollection.reIndex()
reIndex() Considerations:
reIndex()behavior has changed across MongoDB versions, and it can still be disruptive on busy systems. Check the manual for your version, test on staging, and prefer rolling work through replica set members where possible.- Similar to
compact,reIndex()requires additional disk space during the operation.
repairDatabase (Offline Operation)
For severe fragmentation or data corruption, repairDatabase can rebuild all data files. This is an offline operation and requires stopping the mongod instance.
mongod --repair
Warning: repairDatabase should be used as a last resort for space reclamation as it's a destructive operation if not handled carefully and can take a very long time. Always have a backup.
2. Optimizing Indexes
Indexes are crucial for performance but can consume significant disk space. Unused or redundant indexes are pure overhead.
Identifying and Dropping Unnecessary Indexes
Regularly review your indexes to ensure they are still needed.
- List all indexes for a collection:
db.myCollection.getIndexes()
```
2. Monitor index usage: Use $indexStats, query plans, profiling, and your application workload history. Collection stats show index size, but they do not prove whether an index is useful.
3. Identify duplicate or redundant indexes: For example, an index on { a: 1, b: 1 } makes an index on { a: 1 } redundant for queries that can use the compound index. An index on { a: 1, b: 1 } is also covered by an index on { a: 1, b: 1, c: 1 } for queries that only involve a and b.
Once identified, drop the unused index:
db.myCollection.dropIndex("indexName")
Tip: Always test the impact of dropping an index in a staging environment before applying it to production.
Using Partial Indexes
Partial indexes only index documents in a collection that satisfy a specified filter expression. This reduces the number of documents indexed, saving disk space and improving write performance.
db.orders.createIndex(
{ customerId: 1, orderDate: -1 },
{ partialFilterExpression: { status: "active" } }
)
This index would only include documents where status is "active", reducing its size if most orders are historical, cancelled, archived, or otherwise outside the hot path. The important part is not the word "active"; it is the habit of indexing the subset your application actually queries every day.
Start With a Disk-Space Triage, Not a Cleanup Command
When MongoDB disk space is growing, the first mistake is to jump straight to compact, repair, or deleting old data. Those actions can help, but they can also create load, take locks in some situations, or hide the real problem for a few weeks. Start by answering three questions:
- Which filesystem is filling up: the database path, the journal path, the log path, or the backup volume?
- Is live data growing, or is allocated-but-unused space growing after deletes and updates?
- Is the growth coming from collections, indexes, the oplog, logs, diagnostic data, or snapshots?
A quick first pass usually looks like this:
df -h
du -h --max-depth=1 /var/lib/mongodb | sort -h
du -h --max-depth=1 /var/log/mongodb | sort -h
Then check MongoDB from inside the shell:
db.adminCommand({ listDatabases: 1 })
db.getSiblingDB("app").stats()
db.getSiblingDB("app").orders.stats()
storageSize, totalIndexSize, and dataSize tell different stories. If dataSize is growing, you probably have a data lifecycle problem. If storageSize is much larger than dataSize, you may be looking at reusable internal space after deletes. If totalIndexSize is large compared with dataSize, index design deserves attention before you touch compaction.
Understand What MongoDB Can and Cannot Give Back
With WiredTiger, deleting documents usually makes space available for reuse by MongoDB. It does not always return that space to the operating system immediately. That behavior surprises people during emergency cleanup: they delete a large batch, run df -h, and see almost no improvement.
That does not mean the delete failed. It means MongoDB can often reuse that space for future inserts and updates. If the goal is to stop growth, deleting or archiving old data may be enough. If the goal is to shrink the filesystem because the volume is almost full or the host is being downsized, you may need compaction, resyncing a replica set member, or a dump-and-restore style rebuild.
For production systems, I usually separate the work into two tracks. The first track is immediate safety: add disk, remove obvious log buildup, pause risky batch jobs, or move backups off the database volume. The second track is real reduction: fix retention, remove unused indexes, and rebuild storage only after you know where the bytes went.
Fix Data Retention Before You Defragment Anything
If your application keeps request logs, events, sessions, notifications, job records, or analytics documents forever, disk usage will return no matter how carefully you compact. MongoDB gives you a few practical options.
For data that expires on a simple timestamp, a TTL index is often the cleanest answer:
db.sessions.createIndex(
{ expiresAt: 1 },
{ expireAfterSeconds: 0 }
)
That index removes documents after the date stored in expiresAt. It is useful for sessions, temporary tokens, short-lived import jobs, or cached API responses. It is not a replacement for business retention rules. The TTL monitor runs in the background, so do not expect second-by-second deletion, and do not use TTL on data that requires an approval workflow before deletion.
For business records, archive instead of blindly deleting. A common pattern is:
- Copy documents older than the retention window to cheaper storage or an archive database.
- Verify counts and a sample of important fields.
- Delete in small batches from the primary collection.
- Watch replication lag and disk metrics while the job runs.
Small batches matter. A single huge delete can create replication pressure, fill logs, and make rollback harder if someone realizes the filter was wrong. A safer batch job might delete a few thousand documents at a time, sleep briefly, and record progress by _id or timestamp.
while (true) {
const result = db.events.deleteMany({
createdAt: { $lt: ISODate("2025-01-01T00:00:00Z") },
archived: true
});
print(`deleted ${result.deletedCount}`);
if (result.deletedCount === 0) break;
sleep(500);
}
In a real production script, add a limit pattern instead of deleteMany over the whole range, log each batch, and stop automatically if replication lag or disk I/O crosses your threshold.
Be Careful With Index Advice That Sounds Too Simple
Dropping unused indexes is one of the best ways to reduce MongoDB disk space, but "unused" needs context. An index may look unused during a quiet week and still be critical for month-end reports, background reconciliation, or a rare customer support workflow.
Use $indexStats to see access patterns:
db.orders.aggregate([{ $indexStats: {} }])
Then compare the result with application code, scheduled jobs, dashboards, and support queries. If an index has not been used since the last restart, that is a signal, not a verdict. Before dropping it, check whether the server restarted recently and whether the workload sample includes the jobs that matter.
Also watch for overlapping compound indexes. If you have these:
{ customerId: 1 }
{ customerId: 1, createdAt: -1 }
{ customerId: 1, createdAt: -1, status: 1 }
you may be able to remove one, but only after checking sort order, query filters, and whether the shorter index supports a different access pattern. MongoDB can use the left prefix of a compound index, but that does not mean the largest index is always a free replacement. Larger indexes cost more memory and write I/O, so keep the one that fits the workload, not the one that looks most complete.
Prefer Resync for Big Shrink Operations on Replica Sets
For a large replica set, the cleanest way to reclaim operating-system disk space is often to rebuild one secondary at a time. The basic idea is:
- Confirm you have healthy replication and current backups.
- Remove or stop a secondary.
- Wipe its local data directory.
- Let it resync from the primary or another healthy member.
- Repeat for the next secondary.
- Step down the primary during a maintenance window and rebuild the old primary last.
This approach is slower than running a command, but it is easier to reason about because each rebuilt member writes fresh storage files based on current data. It also avoids trying to compact every collection under production traffic. It is not free: initial sync can be network- and disk-heavy, and you need enough remaining members to keep the replica set safe while one member is rebuilding.
For a standalone MongoDB server, you do not have that luxury. In that case, plan a maintenance window, take a tested backup, and consider mongodump/mongorestore or filesystem-level migration to a fresh volume. Do not choose mongod --repair just because you want a smaller data directory. Treat repair as a recovery tool, not routine housekeeping.
Watch the Oplog, Logs, and Backups Too
Not all MongoDB disk pressure comes from collections. In replica sets, the oplog is a capped collection, so it should not grow forever, but its configured size still matters. If it is too small, secondaries can fall off during maintenance. If it is much larger than needed on a small disk, it may be wasting space. Review it deliberately:
db.getSiblingDB("local").oplog.rs.stats()
MongoDB logs can also fill a disk when slow query logging, debug verbosity, or an application error loop gets noisy. Use log rotation and keep database logs away from the same tiny volume that stores data whenever possible.
Backups are another common surprise. Teams sometimes run mongodump to the same host because it is convenient, then wonder why disk alerts fire during the backup window. A backup stored on the same filesystem is not much of a backup, and it can push MongoDB into a worse outage during an already risky operation. Stream backups to object storage, a backup server, or a separate mounted volume.
A Practical Runbook for a Full MongoDB Disk
If the disk is already above 90 percent, slow down and work in this order:
- Confirm whether MongoDB is still accepting writes and whether the replica set is healthy.
- Add temporary disk capacity if the platform allows it. This is often safer than emergency deletion.
- Move or rotate oversized logs and local backup files.
- Stop nonessential batch jobs that are writing heavily.
- Identify the largest collections and indexes with
db.stats()and collectionstats(). - Archive or delete only data with a clear retention rule.
- Plan compaction, resync, or restore after the system is stable.
The best long-term fix is boring: retention rules, index reviews, disk alerts, and tested rebuild procedures. MongoDB is comfortable reusing internal free space, but operators still need to decide what data deserves to live on fast storage and what can move elsewhere.