Step-by-Step Guide: Deploying a Basic MongoDB Sharded Cluster
MongoDB, a popular NoSQL document database, excels in handling large volumes of data with high performance and flexibility. However, as data grows, a single server or replica set can reach its scaling limits. This is where sharding comes into play, enabling horizontal scalability by distributing data across multiple servers, or shards.
This comprehensive guide will walk you through the complete process of setting up a functional MongoDB sharded cluster. You'll learn how to configure the essential components: configuration servers (config servers), mongos routers, and shard replica sets. By the end of this tutorial, you'll have a foundational understanding and practical experience in deploying a sharded cluster designed for high horizontal scalability and availability.
Understanding MongoDB Sharded Clusters
A MongoDB sharded cluster consists of three main components that work together to distribute and route data:
- Shard Replica Sets: These are the actual data-bearing nodes. Each shard is a replica set to provide high availability and data redundancy. Data is partitioned across these shards.
- Configuration Servers (Config Servers): These store the cluster's metadata, including the mapping of data chunks to shards. Starting with MongoDB 3.2, config servers must be deployed as a replica set (CSRS - Config Server Replica Set) for high availability and consistency.
mongosRouters: These act as query routers, providing an interface for client applications. Amongosinstance directs client operations to the appropriate shard(s) based on the cluster's metadata. Applications connect tomongos, not directly to the shards.

Conceptual diagram of a MongoDB sharded cluster (image credit: MongoDB official documentation)
Prerequisites
Before you begin, ensure you have the following:
- Multiple Machines/VMs: For a truly distributed sharded cluster, you'll need at least 6-9 machines/VMs/Docker containers. For this basic tutorial, we can simulate this on a single machine using different ports, but remember a production setup requires dedicated resources.
- 3 for Config Servers (configSrv01, configSrv02, configSrv03)
- Minimum 2-3 for each Shard (e.g., Shard01-RS01, Shard01-RS02, Shard01-RS03; Shard02-RS01, ...)
- 1+ for
mongosRouters
- MongoDB Installation: MongoDB 4.2+ installed on all machines that will host
mongodormongosinstances. You can find installation instructions on the MongoDB Documentation. - Networking: Ensure all machines can communicate with each other on the necessary ports (default
27017,27018,27019,27020for config servers, shards, andmongosrespectively, or custom ports). - Directory Structure: Create dedicated data and log directories for each
mongodandmongosinstance.
For simplicity in this guide, we will use localhost with different ports and directories. In a production environment, you would use actual hostnames or IP addresses.
Recommended Directory Structure (Example for localhost setup)
mkdir -p /data/db/configdb01 /data/db/configdb02 /data/db/configdb03
mkdir -p /data/db/shard01-rs01 /data/db/shard01-rs02 /data/db/shard01-rs03
mkdir -p /data/db/shard02-rs01 /data/db/shard02-rs02 /data/db/shard02-rs03
mkdir -p /data/log/config /data/log/shard01 /data/log/shard02 /data/log/mongos
Deployment Steps
Step 1: Set up Configuration Servers (Config Replica Set)
Configuration servers store metadata for the sharded cluster. They must run as a replica set.
-
Start
mongodinstances for config servers: Each instance needs--configsvrand--replSetoptions.```bash
Config Server 1
mongod --configsvr --replSet cfgReplSet --dbpath /data/db/configdb01 --port 27019 --bind_ip localhost --logpath /data/log/config/configdb01.log --fork
Config Server 2
mongod --configsvr --replSet cfgReplSet --dbpath /data/db/configdb02 --port 27020 --bind_ip localhost --logpath /data/log/config/configdb02.log --fork
Config Server 3
mongod --configsvr --replSet cfgReplSet --dbpath /data/db/configdb03 --port 27021 --bind_ip localhost --logpath /data/log/config/configdb03.log --fork
```Tip: For production, replace
localhostwith actual IP addresses or hostnames. -
Initialize the Config Replica Set: Connect to one of the config server instances and initialize the replica set.
bash mongo --port 27019Inside the mongo shell:
javascript rs.initiate({ _id: "cfgReplSet", configsvr: true, members: [ { _id : 0, host : "localhost:27019" }, { _id : 1, host : "localhost:27020" }, { _id : 2, host : "localhost:27021" } ] });Verify the status:
javascript rs.status();
Step 2: Set up Shard Replica Sets
Each shard in the cluster is a replica set. We'll set up two shards (shard01 and shard02), each with three members.
-
Start
mongodinstances for Shard 1 members: Each instance needs--shardsvrand--replSetoptions.```bash
Shard 1 Member 1
mongod --shardsvr --replSet shard01 --dbpath /data/db/shard01-rs01 --port 27030 --bind_ip localhost --logpath /data/log/shard01/shard01-rs01.log --fork
Shard 1 Member 2
mongod --shardsvr --replSet shard01 --dbpath /data/db/shard01-rs02 --port 27031 --bind_ip localhost --logpath /data/log/shard01/shard01-rs02.log --fork
Shard 1 Member 3
mongod --shardsvr --replSet shard01 --dbpath /data/db/shard01-rs03 --port 27032 --bind_ip localhost --logpath /data/log/shard01/shard01-rs03.log --fork
``` -
Initialize Shard 1 Replica Set: Connect to one of Shard 1 instances.
bash mongo --port 27030Inside the mongo shell:
javascript rs.initiate({ _id : "shard01", members: [ { _id : 0, host : "localhost:27030" }, { _id : 1, host : "localhost:27031" }, { _id : 2, host : "localhost:27032" } ] }); -
Start
mongodinstances for Shard 2 members (repeat for additional shards):```bash
Shard 2 Member 1
mongod --shardsvr --replSet shard02 --dbpath /data/db/shard02-rs01 --port 27040 --bind_ip localhost --logpath /data/log/shard02/shard02-rs01.log --fork
Shard 2 Member 2
mongod --shardsvr --replSet shard02 --dbpath /data/db/shard02-rs02 --port 27041 --bind_ip localhost --logpath /data/log/shard02/shard02-rs02.log --fork
Shard 2 Member 3
mongod --shardsvr --replSet shard02 --dbpath /data/db/shard02-rs03 --port 27042 --bind_ip localhost --logpath /data/log/shard02/shard02-rs03.log --fork
``` -
Initialize Shard 2 Replica Set: Connect to one of Shard 2 instances.
bash mongo --port 27040Inside the mongo shell:
javascript rs.initiate({ _id : "shard02", members: [ { _id : 0, host : "localhost:27040" }, { _id : 1, host : "localhost:27041" }, { _id : 2, host : "localhost:27042" } ] });
Step 3: Set up mongos Routers
mongos instances are the entry points for client applications. They need to know where the config servers are.
-
Start
mongosinstances: Provide the--configdboption, listing the config replica set members.```bash
Mongos Router 1
mongos --configdb cfgReplSet/localhost:27019,localhost:27020,localhost:27021 --port 27017 --bind_ip localhost --logpath /data/log/mongos/mongos01.log --fork
```Note: You can start multiple
mongosinstances for load balancing and high availability. They all connect to the same config servers.
Step 4: Connect to mongos and Add Shards
Now, connect to a mongos instance and add the shard replica sets to the cluster.
-
Connect to
mongos: Use the default MongoDB port27017or the custom port you specified formongos.bash mongo --port 27017 -
Add Shards: Use the
sh.addShard()command, specifying the replica set name and one of its members.javascript sh.addShard("shard01/localhost:27030"); sh.addShard("shard02/localhost:27040");
Step 5: Enable Sharding for a Database and Collection
Once shards are added, you need to enable sharding for specific databases and then for specific collections within those databases. This requires choosing a shard key.
-
Enable Sharding for a Database: Switch to the database you want to shard and run
sh.enableSharding().javascript use mydatabase; sh.enableSharding("mydatabase"); -
Shard a Collection: Choose a
shard keyand usesh.shardCollection().Warning: Choosing an effective shard key is crucial for performance and even distribution. A poor shard key can lead to hot spots or inefficient queries. Common strategies include hashed keys, ranged keys, or compound keys.
For this example, let's assume a collection
mycollectionwith a field_id.```javascript
sh.shardCollection("mydatabase.mycollection"