Step-by-Step Guide to Configuring a Basic Three-Node Cluster
Setting up a resilient Elasticsearch cluster is fundamental for achieving high availability and horizontal scalability in your search and analytics infrastructure. A three-node cluster provides an excellent starting point, offering redundancy necessary to withstand the failure of a single node without service interruption. This comprehensive guide will walk you through the process of installing, configuring, and verifying a basic three-node Elasticsearch cluster, ideal for development environments or small-scale production deployments.
By the end of this tutorial, you will have a functional cluster where data can be distributed and replicated safely, leveraging Elasticsearch's core distributed capabilities.
Prerequisites
Before beginning the configuration, ensure you have the following in place:
- Three Separate Servers/VMs: Each will host one node. For this guide, we assume you have three distinct machines or Docker containers ready.
- Java Development Kit (JDK): Elasticsearch requires a compatible JDK installed on all nodes (e.g., JDK 17, depending on your Elasticsearch version).
- Network Connectivity: Ensure that all three nodes can communicate with each other over the necessary ports (default HTTP port: 9200, default transport port: 9300).
- Identical Elasticsearch Installation: Download and extract the same version of Elasticsearch onto all three nodes.
Step 1: Configuring Each Node's elasticsearch.yml
The configuration file, elasticsearch.yml, located in the config/ directory of your Elasticsearch installation, is crucial for defining how each node behaves within the cluster. You must adjust settings specific to each node.
We will define three roles implicitly: one master-eligible node, and three data-eligible nodes. For a three-node cluster, it is common practice to allow all nodes to hold master, data, and ingest roles.
Common Settings for All Nodes
Ensure these settings are identical across all three configuration files:
# Cluster Name: Must be the same on all nodes
cluster.name: my-three-node-cluster
# Discovery Settings (Crucial for initial joining)
# Use a seed list of known nodes to bootstrap discovery
discovery.seed_hosts: ["node1_ip:9300", "node2_ip:9300", "node3_ip:9300"]
# Required for quorum (N/2 + 1). For 3 nodes, we need 2 votes.
cluster.initial_master_nodes: ["node1_name", "node2_name", "node3_name"]
# Network Settings (Ensure binding to the correct IP)
network.host: 0.0.0.0 # Or the specific private IP of the host
# HTTP Port (External access)
http.port: 9200
# Transport Port (Internal cluster communication)
transport.port: 9300
Unique Settings Per Node
Each node requires a unique node.name and potentially a unique path.data if running on the same machine or sharing storage.
Node 1 Configuration (node1_ip)
# Unique Identifier for Node 1
node.name: node-1
# If paths differ
# path.data: /var/lib/elasticsearch/data_node1
Node 2 Configuration (node2_ip)
# Unique Identifier for Node 2
node.name: node-2
Node 3 Configuration (node3_ip)
# Unique Identifier for Node 3
node.name: node-3
Important Note on
cluster.initial_master_nodes: This setting is only used when the cluster starts for the very first time. Once the cluster forms, Elasticsearch manages master election internally. If you ever need to restart a completely dead cluster, you must ensure these names match the initial configuration.
Step 2: Setting Up Roles (Optional but Recommended)
While the default configuration allows nodes to take on all roles (master, data, ingest, coordinating), in larger deployments, roles are separated. For a robust three-node setup, we ensure all nodes are eligible to become the master.
Add the following roles configuration to all three elasticsearch.yml files:
# Enable all standard roles on all nodes for this initial setup
node.roles: [ master, data, ingest, remote_cluster_client ]
Handling Quorum for Resilience
With three nodes, the cluster can tolerate the loss of one node while maintaining quorum (2 out of 3 nodes remain). This is managed by the cluster.initial_master_nodes list provided in Step 1.
Step 3: Starting the Cluster Nodes
Start Elasticsearch sequentially on each node. It's generally safest to start the nodes listed first in cluster.initial_master_nodes first, though modern Elasticsearch handles unordered startup well.
On Node 1, Node 2, and Node 3:
Navigate to your Elasticsearch installation directory and run:
# For running in the foreground (useful for debugging)
bin/elasticsearch
# For running in the background (production recommended)
bin/elasticsearch -d
Monitor the logs (logs/elasticsearch.log) on each node for successful startup messages, particularly those indicating they have successfully joined the cluster.
Step 4: Verifying Cluster Health
Once all nodes have started, use the _cat/health API, accessible via any node's HTTP port (default 9200), to confirm the cluster status.
Access this from a machine that can reach the nodes (e.g., via curl):
Check Health:
curl -X GET "http://node1_ip:9200/_cat/health?v"
Expected Output Snippet:
| epoch | timestamp | cluster | status | node.total | node.data | shards | pri | relo | init | unassigned | unpersisted |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1701331200 | 12:00:00 | my-three-node-cluster | green | 3 | 3 | 0 | 0 | 0 | 0 | 0 | 0 |
If status is green and node.total is 3, your cluster is up and running correctly.
Verifying Node Membership
To confirm that all nodes see each other, check the node list:
curl -X GET "http://node1_ip:9200/_cat/nodes?v"
You should see three distinct entries corresponding to node-1, node-2, and node-3, each showing their IP addresses and roles (m for master-eligible, d for data).
Step 5: Creating a Test Index with Replication
To verify the cluster's ability to distribute data and handle replication, we must create an index specifying at least one replica.
In a three-node cluster, setting number_of_replicas to 1 ensures that every primary shard has one copy (replica) distributed across a different node, providing immediate fault tolerance.
Create Index Command:
```bash
curl -X PUT "http://node1_ip:9200/test_data_index?pretty" -H 'Content-Type: application/json' -d'
{
"settings": {
"number_of_shards": 3