Step-by-Step Guide: Deploying a Basic MongoDB Sharded Cluster

Deploy a basic MongoDB sharded cluster with config servers, shard replica sets, mongos routers, and sharding verification.

Step-by-Step Guide: Deploying a Basic MongoDB Sharded Cluster

MongoDB, a popular NoSQL document database, excels in handling large volumes of data with high performance and flexibility. However, as data grows, a single server or replica set can reach its scaling limits. This is where sharding comes into play, enabling horizontal scalability by distributing data across multiple servers, or shards.

This guide walks you through a basic MongoDB sharded cluster on localhost for learning. You'll configure config servers, shard replica sets, and a mongos router, then enable sharding for a collection.

Understanding MongoDB Sharded Clusters

A MongoDB sharded cluster consists of three main components that work together to distribute and route data:

  • Shard Replica Sets: These are the actual data-bearing nodes. Each shard is a replica set to provide high availability and data redundancy. Data is partitioned across these shards.
  • Configuration Servers (Config Servers): These store the cluster's metadata, including the mapping of data chunks to shards. Starting with MongoDB 3.2, config servers must be deployed as a replica set (CSRS - Config Server Replica Set) for high availability and consistency.
  • mongos Routers: These act as query routers, providing an interface for client applications. A mongos instance directs client operations to the appropriate shard(s) based on the cluster's metadata. Applications connect to mongos, not directly to the shards.

MongoDB Sharded Cluster Architecture Diagram (Conceptual) Conceptual diagram of a MongoDB sharded cluster (image credit: MongoDB official documentation)

Prerequisites

Before you begin, ensure you have the following:

  1. Multiple Machines/VMs: For a truly distributed sharded cluster, you'll need at least 6-9 machines/VMs/Docker containers. For this basic tutorial, we can simulate this on a single machine using different ports, but remember a production setup requires dedicated resources.
    • 3 for Config Servers (configSrv01, configSrv02, configSrv03)
    • Minimum 2-3 for each Shard (e.g., Shard01-RS01, Shard01-RS02, Shard01-RS03; Shard02-RS01, ...)
    • 1+ for mongos Routers
  2. MongoDB Installation: Install a supported MongoDB Server version on every machine that will host mongod or mongos. Use mongosh for shell commands.
  3. Networking: Ensure all machines can communicate with each other on the necessary ports (default 27017, 27018, 27019, 27020 for config servers, shards, and mongos respectively, or custom ports).
  4. Directory Structure: Create dedicated data and log directories for each mongod and mongos instance.

For simplicity in this guide, we will use localhost with different ports and directories. In a production environment, you would use actual hostnames or IP addresses.

Recommended Directory Structure (Example for localhost setup)

mkdir -p /data/db/configdb01 /data/db/configdb02 /data/db/configdb03
mkdir -p /data/db/shard01-rs01 /data/db/shard01-rs02 /data/db/shard01-rs03
mkdir -p /data/db/shard02-rs01 /data/db/shard02-rs02 /data/db/shard02-rs03
mkdir -p /data/log/config /data/log/shard01 /data/log/shard02 /data/log/mongos

Deployment Steps

Step 1: Set up Configuration Servers (Config Replica Set)

Configuration servers store metadata for the sharded cluster. They must run as a replica set.

  1. Start mongod instances for config servers: Each instance needs --configsvr and --replSet options.

    # Config Server 1
    mongod --configsvr --replSet cfgReplSet --dbpath /data/db/configdb01 --port 27019 --bind_ip localhost --logpath /data/log/config/configdb01.log --fork
    
    # Config Server 2
    mongod --configsvr --replSet cfgReplSet --dbpath /data/db/configdb02 --port 27020 --bind_ip localhost --logpath /data/log/config/configdb02.log --fork
    
    # Config Server 3
    mongod --configsvr --replSet cfgReplSet --dbpath /data/db/configdb03 --port 27021 --bind_ip localhost --logpath /data/log/config/configdb03.log --fork
    

    Tip: For production, replace localhost with actual IP addresses or hostnames.

  2. Initialize the Config Replica Set: Connect to one of the config server instances and initialize the replica set.

    mongosh --port 27019
    

    Inside the mongo shell:

    rs.initiate({
       _id: "cfgReplSet",
       configsvr: true,
       members: [
          { _id : 0, host : "localhost:27019" },
          { _id : 1, host : "localhost:27020" },
          { _id : 2, host : "localhost:27021" }
       ]
    });
    

    Verify the status:

    rs.status();
    

Step 2: Set up Shard Replica Sets

Each shard in the cluster is a replica set. We'll set up two shards (shard01 and shard02), each with three members.

  1. Start mongod instances for Shard 1 members: Each instance needs --shardsvr and --replSet options.

    # Shard 1 Member 1
    mongod --shardsvr --replSet shard01 --dbpath /data/db/shard01-rs01 --port 27030 --bind_ip localhost --logpath /data/log/shard01/shard01-rs01.log --fork
    
    # Shard 1 Member 2
    mongod --shardsvr --replSet shard01 --dbpath /data/db/shard01-rs02 --port 27031 --bind_ip localhost --logpath /data/log/shard01/shard01-rs02.log --fork
    
    # Shard 1 Member 3
    mongod --shardsvr --replSet shard01 --dbpath /data/db/shard01-rs03 --port 27032 --bind_ip localhost --logpath /data/log/shard01/shard01-rs03.log --fork
    
  2. Initialize Shard 1 Replica Set: Connect to one of Shard 1 instances.

    mongosh --port 27030
    

    Inside the mongo shell:

    rs.initiate({
       _id : "shard01",
       members: [
          { _id : 0, host : "localhost:27030" },
          { _id : 1, host : "localhost:27031" },
          { _id : 2, host : "localhost:27032" }
       ]
    });
    
  3. Start mongod instances for Shard 2 members (repeat for additional shards):

    # Shard 2 Member 1
    mongod --shardsvr --replSet shard02 --dbpath /data/db/shard02-rs01 --port 27040 --bind_ip localhost --logpath /data/log/shard02/shard02-rs01.log --fork
    
    # Shard 2 Member 2
    mongod --shardsvr --replSet shard02 --dbpath /data/db/shard02-rs02 --port 27041 --bind_ip localhost --logpath /data/log/shard02/shard02-rs02.log --fork
    
    # Shard 2 Member 3
    mongod --shardsvr --replSet shard02 --dbpath /data/db/shard02-rs03 --port 27042 --bind_ip localhost --logpath /data/log/shard02/shard02-rs03.log --fork
    
  4. Initialize Shard 2 Replica Set: Connect to one of Shard 2 instances.

    mongosh --port 27040
    

    Inside the mongo shell:

    rs.initiate({
       _id : "shard02",
       members: [
          { _id : 0, host : "localhost:27040" },
          { _id : 1, host : "localhost:27041" },
          { _id : 2, host : "localhost:27042" }
       ]
    });
    

Step 3: Set up mongos Routers

mongos instances are the entry points for client applications. They need to know where the config servers are.

  1. Start mongos instances: Provide the --configdb option, listing the config replica set members.

    # Mongos Router 1
    mongos --configdb cfgReplSet/localhost:27019,localhost:27020,localhost:27021 --port 27017 --bind_ip localhost --logpath /data/log/mongos/mongos01.log --fork
    

    Note: You can start multiple mongos instances for load balancing and high availability. They all connect to the same config servers.

Step 4: Connect to mongos and Add Shards

Now, connect to a mongos instance and add the shard replica sets to the cluster.

  1. Connect to mongos: Use the default MongoDB port 27017 or the custom port you specified for mongos.

    mongosh --port 27017
    
  2. Add Shards: Use the sh.addShard() command, specifying the replica set name and one of its members.

    sh.addShard("shard01/localhost:27030");
    sh.addShard("shard02/localhost:27040");
    

Step 5: Enable Sharding for a Database and Collection

Once shards are added, you need to enable sharding for specific databases and then for specific collections within those databases. This requires choosing a shard key.

  1. Enable Sharding for a Database: Switch to the database you want to shard and run sh.enableSharding().

    use mydatabase;
    sh.enableSharding("mydatabase");
    
  2. Shard a Collection: Choose a shard key and use sh.shardCollection().

    Warning: Choosing an effective shard key is crucial for performance and even distribution. A poor shard key can lead to hot spots or inefficient queries. Common strategies include hashed keys, ranged keys, or compound keys.

    For this example, let's assume a collection mycollection with a field _id.

    sh.shardCollection("mydatabase.mycollection", { _id: "hashed" });
    

    A hashed _id shard key is simple for a tutorial because it spreads inserts across shards better than a monotonically increasing ranged _id key. For a real application, choose the shard key from your query patterns and write distribution, not from convenience alone.

Step 6: Verify the Cluster

Run these commands from mongosh connected to mongos:

sh.status();
db.adminCommand({ listShards: 1 });

Then insert sample documents and check that the sharded collection exists:

use mydatabase;
db.mycollection.insertMany([
  { _id: 1, name: "alpha" },
  { _id: 2, name: "beta" },
  { _id: 3, name: "gamma" }
]);

db.mycollection.getShardDistribution();

Small test datasets may not split immediately, so do not expect perfect distribution after only a few documents. The important first check is that sh.status() lists both shards and shows mydatabase.mycollection as sharded.

Production Notes

This localhost setup is useful for learning the moving parts, but production needs more care:

  • Use real hostnames, not localhost, because replica set member names are stored in cluster metadata.
  • Run config servers as a three-member replica set.
  • Run each shard as a replica set with members spread across failure domains.
  • Enable authentication and internal keyfile or x.509 authentication before exposing the cluster.
  • Back up config server metadata and shard data as part of one cluster-aware backup plan.
  • Monitor chunk distribution, balancer activity, replication lag, and disk growth.

Final Takeaway

A MongoDB sharded cluster has three jobs: config servers track metadata, shard replica sets store data, and mongos routes client traffic. Get those roles working first, then spend most of your design time on the shard key, because that choice determines whether your cluster spreads load cleanly or creates hot shards.