Step-by-Step Guide to Configuring a Basic Three-Node Cluster

Learn how to quickly set up a resilient, basic three-node Elasticsearch cluster. This step-by-step tutorial covers essential configuration in `elasticsearch.yml`, bootstrapping cluster discovery using `cluster.initial_master_nodes`, starting the services, and verifying health and shard replication across the nodes using practical cURL commands.

44 views

Step-by-Step Guide to Configuring a Basic Three-Node Cluster

Setting up a resilient Elasticsearch cluster is fundamental for achieving high availability and horizontal scalability in your search and analytics infrastructure. A three-node cluster provides an excellent starting point, offering redundancy necessary to withstand the failure of a single node without service interruption. This comprehensive guide will walk you through the process of installing, configuring, and verifying a basic three-node Elasticsearch cluster, ideal for development environments or small-scale production deployments.

By the end of this tutorial, you will have a functional cluster where data can be distributed and replicated safely, leveraging Elasticsearch's core distributed capabilities.


Prerequisites

Before beginning the configuration, ensure you have the following in place:

  1. Three Separate Servers/VMs: Each will host one node. For this guide, we assume you have three distinct machines or Docker containers ready.
  2. Java Development Kit (JDK): Elasticsearch requires a compatible JDK installed on all nodes (e.g., JDK 17, depending on your Elasticsearch version).
  3. Network Connectivity: Ensure that all three nodes can communicate with each other over the necessary ports (default HTTP port: 9200, default transport port: 9300).
  4. Identical Elasticsearch Installation: Download and extract the same version of Elasticsearch onto all three nodes.

Step 1: Configuring Each Node's elasticsearch.yml

The configuration file, elasticsearch.yml, located in the config/ directory of your Elasticsearch installation, is crucial for defining how each node behaves within the cluster. You must adjust settings specific to each node.

We will define three roles implicitly: one master-eligible node, and three data-eligible nodes. For a three-node cluster, it is common practice to allow all nodes to hold master, data, and ingest roles.

Common Settings for All Nodes

Ensure these settings are identical across all three configuration files:

# Cluster Name: Must be the same on all nodes
cluster.name: my-three-node-cluster

# Discovery Settings (Crucial for initial joining)
# Use a seed list of known nodes to bootstrap discovery
discovery.seed_hosts: ["node1_ip:9300", "node2_ip:9300", "node3_ip:9300"]

# Required for quorum (N/2 + 1). For 3 nodes, we need 2 votes.
cluster.initial_master_nodes: ["node1_name", "node2_name", "node3_name"]

# Network Settings (Ensure binding to the correct IP)
network.host: 0.0.0.0 # Or the specific private IP of the host

# HTTP Port (External access)
http.port: 9200

# Transport Port (Internal cluster communication)
transport.port: 9300

Unique Settings Per Node

Each node requires a unique node.name and potentially a unique path.data if running on the same machine or sharing storage.

Node 1 Configuration (node1_ip)

# Unique Identifier for Node 1
node.name: node-1

# If paths differ
# path.data: /var/lib/elasticsearch/data_node1

Node 2 Configuration (node2_ip)

# Unique Identifier for Node 2
node.name: node-2

Node 3 Configuration (node3_ip)

# Unique Identifier for Node 3
node.name: node-3

Important Note on cluster.initial_master_nodes: This setting is only used when the cluster starts for the very first time. Once the cluster forms, Elasticsearch manages master election internally. If you ever need to restart a completely dead cluster, you must ensure these names match the initial configuration.


While the default configuration allows nodes to take on all roles (master, data, ingest, coordinating), in larger deployments, roles are separated. For a robust three-node setup, we ensure all nodes are eligible to become the master.

Add the following roles configuration to all three elasticsearch.yml files:

# Enable all standard roles on all nodes for this initial setup
node.roles: [ master, data, ingest, remote_cluster_client ]

Handling Quorum for Resilience

With three nodes, the cluster can tolerate the loss of one node while maintaining quorum (2 out of 3 nodes remain). This is managed by the cluster.initial_master_nodes list provided in Step 1.


Step 3: Starting the Cluster Nodes

Start Elasticsearch sequentially on each node. It's generally safest to start the nodes listed first in cluster.initial_master_nodes first, though modern Elasticsearch handles unordered startup well.

On Node 1, Node 2, and Node 3:

Navigate to your Elasticsearch installation directory and run:

# For running in the foreground (useful for debugging)
bin/elasticsearch

# For running in the background (production recommended)
bin/elasticsearch -d

Monitor the logs (logs/elasticsearch.log) on each node for successful startup messages, particularly those indicating they have successfully joined the cluster.


Step 4: Verifying Cluster Health

Once all nodes have started, use the _cat/health API, accessible via any node's HTTP port (default 9200), to confirm the cluster status.

Access this from a machine that can reach the nodes (e.g., via curl):

Check Health:

curl -X GET "http://node1_ip:9200/_cat/health?v"

Expected Output Snippet:

epoch timestamp cluster status node.total node.data shards pri relo init unassigned unpersisted
1701331200 12:00:00 my-three-node-cluster green 3 3 0 0 0 0 0 0

If status is green and node.total is 3, your cluster is up and running correctly.

Verifying Node Membership

To confirm that all nodes see each other, check the node list:

curl -X GET "http://node1_ip:9200/_cat/nodes?v"

You should see three distinct entries corresponding to node-1, node-2, and node-3, each showing their IP addresses and roles (m for master-eligible, d for data).


Step 5: Creating a Test Index with Replication

To verify the cluster's ability to distribute data and handle replication, we must create an index specifying at least one replica.

In a three-node cluster, setting number_of_replicas to 1 ensures that every primary shard has one copy (replica) distributed across a different node, providing immediate fault tolerance.

Create Index Command:

```bash
curl -X PUT "http://node1_ip:9200/test_data_index?pretty" -H 'Content-Type: application/json' -d'
{
"settings": {
"number_of_shards": 3