Step-by-Step Guide: Deploying a Basic MongoDB Sharded Cluster
Deploy a basic MongoDB sharded cluster with config servers, shard replica sets, mongos routers, and sharding verification.
Step-by-Step Guide: Deploying a Basic MongoDB Sharded Cluster
MongoDB, a popular NoSQL document database, excels in handling large volumes of data with high performance and flexibility. However, as data grows, a single server or replica set can reach its scaling limits. This is where sharding comes into play, enabling horizontal scalability by distributing data across multiple servers, or shards.
This guide walks you through a basic MongoDB sharded cluster on localhost for learning. You'll configure config servers, shard replica sets, and a mongos router, then enable sharding for a collection.
Understanding MongoDB Sharded Clusters
A MongoDB sharded cluster consists of three main components that work together to distribute and route data:
- Shard Replica Sets: These are the actual data-bearing nodes. Each shard is a replica set to provide high availability and data redundancy. Data is partitioned across these shards.
- Configuration Servers (Config Servers): These store the cluster's metadata, including the mapping of data chunks to shards. Starting with MongoDB 3.2, config servers must be deployed as a replica set (CSRS - Config Server Replica Set) for high availability and consistency.
mongosRouters: These act as query routers, providing an interface for client applications. Amongosinstance directs client operations to the appropriate shard(s) based on the cluster's metadata. Applications connect tomongos, not directly to the shards.
Conceptual diagram of a MongoDB sharded cluster (image credit: MongoDB official documentation)
Prerequisites
Before you begin, ensure you have the following:
- Multiple Machines/VMs: For a truly distributed sharded cluster, you'll need at least 6-9 machines/VMs/Docker containers. For this basic tutorial, we can simulate this on a single machine using different ports, but remember a production setup requires dedicated resources.
- 3 for Config Servers (configSrv01, configSrv02, configSrv03)
- Minimum 2-3 for each Shard (e.g., Shard01-RS01, Shard01-RS02, Shard01-RS03; Shard02-RS01, ...)
- 1+ for
mongosRouters
- MongoDB Installation: Install a supported MongoDB Server version on every machine that will host
mongodormongos. Usemongoshfor shell commands. - Networking: Ensure all machines can communicate with each other on the necessary ports (default
27017,27018,27019,27020for config servers, shards, andmongosrespectively, or custom ports). - Directory Structure: Create dedicated data and log directories for each
mongodandmongosinstance.
For simplicity in this guide, we will use localhost with different ports and directories. In a production environment, you would use actual hostnames or IP addresses.
Recommended Directory Structure (Example for localhost setup)
mkdir -p /data/db/configdb01 /data/db/configdb02 /data/db/configdb03
mkdir -p /data/db/shard01-rs01 /data/db/shard01-rs02 /data/db/shard01-rs03
mkdir -p /data/db/shard02-rs01 /data/db/shard02-rs02 /data/db/shard02-rs03
mkdir -p /data/log/config /data/log/shard01 /data/log/shard02 /data/log/mongos
Deployment Steps
Step 1: Set up Configuration Servers (Config Replica Set)
Configuration servers store metadata for the sharded cluster. They must run as a replica set.
Start
mongodinstances for config servers: Each instance needs--configsvrand--replSetoptions.# Config Server 1 mongod --configsvr --replSet cfgReplSet --dbpath /data/db/configdb01 --port 27019 --bind_ip localhost --logpath /data/log/config/configdb01.log --fork # Config Server 2 mongod --configsvr --replSet cfgReplSet --dbpath /data/db/configdb02 --port 27020 --bind_ip localhost --logpath /data/log/config/configdb02.log --fork # Config Server 3 mongod --configsvr --replSet cfgReplSet --dbpath /data/db/configdb03 --port 27021 --bind_ip localhost --logpath /data/log/config/configdb03.log --forkTip: For production, replace
localhostwith actual IP addresses or hostnames.Initialize the Config Replica Set: Connect to one of the config server instances and initialize the replica set.
mongosh --port 27019Inside the mongo shell:
rs.initiate({ _id: "cfgReplSet", configsvr: true, members: [ { _id : 0, host : "localhost:27019" }, { _id : 1, host : "localhost:27020" }, { _id : 2, host : "localhost:27021" } ] });Verify the status:
rs.status();
Step 2: Set up Shard Replica Sets
Each shard in the cluster is a replica set. We'll set up two shards (shard01 and shard02), each with three members.
Start
mongodinstances for Shard 1 members: Each instance needs--shardsvrand--replSetoptions.# Shard 1 Member 1 mongod --shardsvr --replSet shard01 --dbpath /data/db/shard01-rs01 --port 27030 --bind_ip localhost --logpath /data/log/shard01/shard01-rs01.log --fork # Shard 1 Member 2 mongod --shardsvr --replSet shard01 --dbpath /data/db/shard01-rs02 --port 27031 --bind_ip localhost --logpath /data/log/shard01/shard01-rs02.log --fork # Shard 1 Member 3 mongod --shardsvr --replSet shard01 --dbpath /data/db/shard01-rs03 --port 27032 --bind_ip localhost --logpath /data/log/shard01/shard01-rs03.log --forkInitialize Shard 1 Replica Set: Connect to one of Shard 1 instances.
mongosh --port 27030Inside the mongo shell:
rs.initiate({ _id : "shard01", members: [ { _id : 0, host : "localhost:27030" }, { _id : 1, host : "localhost:27031" }, { _id : 2, host : "localhost:27032" } ] });Start
mongodinstances for Shard 2 members (repeat for additional shards):# Shard 2 Member 1 mongod --shardsvr --replSet shard02 --dbpath /data/db/shard02-rs01 --port 27040 --bind_ip localhost --logpath /data/log/shard02/shard02-rs01.log --fork # Shard 2 Member 2 mongod --shardsvr --replSet shard02 --dbpath /data/db/shard02-rs02 --port 27041 --bind_ip localhost --logpath /data/log/shard02/shard02-rs02.log --fork # Shard 2 Member 3 mongod --shardsvr --replSet shard02 --dbpath /data/db/shard02-rs03 --port 27042 --bind_ip localhost --logpath /data/log/shard02/shard02-rs03.log --forkInitialize Shard 2 Replica Set: Connect to one of Shard 2 instances.
mongosh --port 27040Inside the mongo shell:
rs.initiate({ _id : "shard02", members: [ { _id : 0, host : "localhost:27040" }, { _id : 1, host : "localhost:27041" }, { _id : 2, host : "localhost:27042" } ] });
Step 3: Set up mongos Routers
mongos instances are the entry points for client applications. They need to know where the config servers are.
Start
mongosinstances: Provide the--configdboption, listing the config replica set members.# Mongos Router 1 mongos --configdb cfgReplSet/localhost:27019,localhost:27020,localhost:27021 --port 27017 --bind_ip localhost --logpath /data/log/mongos/mongos01.log --forkNote: You can start multiple
mongosinstances for load balancing and high availability. They all connect to the same config servers.
Step 4: Connect to mongos and Add Shards
Now, connect to a mongos instance and add the shard replica sets to the cluster.
Connect to
mongos: Use the default MongoDB port27017or the custom port you specified formongos.mongosh --port 27017Add Shards: Use the
sh.addShard()command, specifying the replica set name and one of its members.sh.addShard("shard01/localhost:27030"); sh.addShard("shard02/localhost:27040");
Step 5: Enable Sharding for a Database and Collection
Once shards are added, you need to enable sharding for specific databases and then for specific collections within those databases. This requires choosing a shard key.
Enable Sharding for a Database: Switch to the database you want to shard and run
sh.enableSharding().use mydatabase; sh.enableSharding("mydatabase");Shard a Collection: Choose a
shard keyand usesh.shardCollection().Warning: Choosing an effective shard key is crucial for performance and even distribution. A poor shard key can lead to hot spots or inefficient queries. Common strategies include hashed keys, ranged keys, or compound keys.
For this example, let's assume a collection
mycollectionwith a field_id.sh.shardCollection("mydatabase.mycollection", { _id: "hashed" });A hashed
_idshard key is simple for a tutorial because it spreads inserts across shards better than a monotonically increasing ranged_idkey. For a real application, choose the shard key from your query patterns and write distribution, not from convenience alone.
Step 6: Verify the Cluster
Run these commands from mongosh connected to mongos:
sh.status();
db.adminCommand({ listShards: 1 });
Then insert sample documents and check that the sharded collection exists:
use mydatabase;
db.mycollection.insertMany([
{ _id: 1, name: "alpha" },
{ _id: 2, name: "beta" },
{ _id: 3, name: "gamma" }
]);
db.mycollection.getShardDistribution();
Small test datasets may not split immediately, so do not expect perfect distribution after only a few documents. The important first check is that sh.status() lists both shards and shows mydatabase.mycollection as sharded.
Production Notes
This localhost setup is useful for learning the moving parts, but production needs more care:
- Use real hostnames, not
localhost, because replica set member names are stored in cluster metadata. - Run config servers as a three-member replica set.
- Run each shard as a replica set with members spread across failure domains.
- Enable authentication and internal keyfile or x.509 authentication before exposing the cluster.
- Back up config server metadata and shard data as part of one cluster-aware backup plan.
- Monitor chunk distribution, balancer activity, replication lag, and disk growth.
Final Takeaway
A MongoDB sharded cluster has three jobs: config servers track metadata, shard replica sets store data, and mongos routes client traffic. Get those roles working first, then spend most of your design time on the shard key, because that choice determines whether your cluster spreads load cleanly or creates hot shards.