Guide to Setting Up a High-Availability Elasticsearch Cluster
Elasticsearch is a powerful, distributed search and analytics engine designed for scalability and resilience. In production environments, ensuring continuous operation and fault tolerance is paramount. This guide will walk you through the essential steps for configuring multiple Elasticsearch nodes to create a robust, high-availability (HA) cluster. By following these instructions, you'll learn how to set up your cluster to withstand node failures and maintain data accessibility, ensuring your applications remain responsive and your data remains secure.
Setting up a high-availability Elasticsearch cluster involves careful planning of node roles, network configuration, and data replication strategies. The goal is to distribute workload and data redundantly across multiple machines, eliminating single points of failure. This article will cover the core concepts, practical configuration steps, and best practices to help you build a resilient Elasticsearch infrastructure, suitable for demanding production use cases.
Understanding High-Availability in Elasticsearch
High-availability in Elasticsearch is achieved through several key mechanisms:
- Distributed Architecture: Elasticsearch inherently distributes data and operations across multiple nodes.
- Node Roles: Different nodes can serve different purposes, allowing for specialized resource allocation and failure isolation.
- Shard Replication: Each index is divided into shards, and each primary shard can have one or more replica shards, stored on different nodes.
- Master Node Election: A robust election process ensures a master node is always available to manage the cluster state.
- Zen Discovery (Zen2): This module handles node discovery and master election, ensuring nodes can find each other and form a cluster reliably.
Essential Node Roles
In an HA setup, understanding node roles is crucial. The primary roles for HA are:
- Master-eligible nodes: These nodes are responsible for managing the cluster state, including index creation/deletion, tracking nodes, and shard allocation. They do not store data or handle search/index requests directly unless they also have the
datarole. For HA, you should have an odd number (typically 3) of dedicated master-eligible nodes to form a quorum. - Data nodes: These nodes store your indexed data in shards and perform data-related operations like search, aggregation, and indexing. They are the workhorses of your cluster.
- Coordinating-only nodes: (Optional) These nodes can be used to route requests, handle search reduce phases, and manage bulk indexing. They don't hold data or cluster state but can offload work from data and master nodes.
Shards and Replicas
Elasticsearch stores your data in shards. Each index consists of one or more primary shards. To achieve high availability, you should configure one or more replica shards for each primary shard. Replica shards are copies of primary shards. If a node hosting a primary shard fails, a replica shard on another node can be promoted to be the new primary, ensuring no data loss and continued operation.
Prerequisites for Setting Up an HA Cluster
Before diving into configuration, ensure your environment meets these basic requirements:
- Java Development Kit (JDK): Elasticsearch requires a compatible JDK (typically OpenJDK). Ensure it's installed on all nodes.
- System Resources: Allocate sufficient RAM (e.g., 8-32GB), CPU cores, and fast I/O disk space (SSD recommended) for each node, especially data nodes.
- Network Configuration: All nodes must be able to communicate with each other over specific ports (default 9300 for inter-node communication, 9200 for HTTP API). Ensure firewalls are configured appropriately.
- Operating System: A stable Linux distribution (e.g., Ubuntu, CentOS, RHEL) is generally preferred for production deployments.
Step-by-Step Guide to HA Cluster Setup
This section outlines the process for installing and configuring a multi-node Elasticsearch cluster.
Step 1: Install Elasticsearch on All Nodes
Install Elasticsearch on each server that will be part of your cluster. You can use package managers (APT for Debian/Ubuntu, YUM for RHEL/CentOS) or download the archive directly.
Example (Debian/Ubuntu via APT):
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo gpg --dearmor -o /usr/share/keyrings/elasticsearch-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/elasticsearch-keyring.gpg] https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-8.x.list
sudo apt update
sudo apt install elasticsearch
After installation, enable and start the service (though we'll configure it first).
sudo systemctl daemon-reload
sudo systemctl enable elasticsearch
Step 2: Configure elasticsearch.yml on Each Node
The elasticsearch.yml file, typically located in /etc/elasticsearch/, is where you define your cluster's settings. Edit this file on each node with the appropriate configurations.
Common Configuration for All Nodes
-
cluster.name: This must be identical for all nodes you want to join the same cluster.
yaml cluster.name: my-ha-cluster -
node.name: A unique name for each node, helpful for identification.
yaml node.name: node-1 -
network.host: Binds Elasticsearch to a specific network interface. Use0.0.0.0to bind to all available interfaces, or a specific IP address.
yaml network.host: 0.0.0.0 # or a specific IP address for security/multi-NIC setups # network.host: 192.168.1.101 -
http.port: The port for HTTP client communication (default 9200).
yaml http.port: 9200 -
transport.port: The port for inter-node communication (default 9300). Should be consistent.
yaml transport.port: 9300
Discovery Settings (Crucial for HA)
These settings tell nodes how to find each other and form a cluster.
-
discovery.seed_hosts: A list of addresses of master-eligible nodes in your cluster. This is how nodes discover initial master-eligible nodes. Provide the IP addresses or hostnames of all your master-eligible nodes.
yaml discovery.seed_hosts: ["192.168.1.101", "192.168.1.102", "192.168.1.103"] -
cluster.initial_master_nodes: Used only when bootstrapping a brand-new cluster for the first time. This list should contain thenode.nameof the master-eligible nodes that will participate in the first master election. Once the cluster has formed, this setting is ignored.
yaml cluster.initial_master_nodes: ["node-1", "node-2", "node-3"]- Important Tip: Remove or comment out
cluster.initial_master_nodesafter the cluster has successfully formed to prevent unintended behavior if a node restarts and tries to form a new cluster.
- Important Tip: Remove or comment out
Node Role Configuration
Specify the role(s) for each node. A common HA setup involves 3 dedicated master nodes and several data nodes.
- Master-eligible Nodes (e.g., node-1, node-2, node-3):
yaml node.roles: [master] - Data Nodes (e.g., node-4, node-5, node-6):
yaml node.roles: [data] - Mixed Role Nodes (not recommended for large production HA):
yaml node.roles: [master, data]- Best Practice: For true high-availability and stability in production, dedicate separate nodes for master and data roles. This isolates critical master processes from resource-intensive data operations.
Step 3: Configure JVM Heap Size
Edit /etc/elasticsearch/jvm.options to set the JVM heap size. A good rule of thumb is to allocate 50% of available RAM, but never exceeding 30-32GB. For example, if a server has 16GB RAM, allocate 8GB:
-Xms8g
-Xmx8g
Step 4: System Settings
For production, increase the vm.max_map_count and ulimit for open files on all nodes. Add these lines to /etc/sysctl.conf and apply (sudo sysctl -p).
vm.max_map_count=262144
And in /etc/security/limits.conf (or /etc/security/limits.d/99-elasticsearch.conf):
elasticsearch - nofile 65536
elasticsearch - memlock unlimited
Step 5: Start Elasticsearch Services
Start the Elasticsearch service on all configured nodes. It's often recommended to start master-eligible nodes first, but with modern discovery, the order is less critical as long as discovery.seed_hosts is correctly configured.
sudo systemctl start elasticsearch
Check the service status and logs for any errors:
sudo systemctl status elasticsearch
sudo journalctl -f -u elasticsearch
Step 6: Verify Cluster Health
Once all nodes are running, verify the cluster health using the Elasticsearch API. You can query any node in the cluster.
curl -X GET "localhost:9200/_cat/health?v&pretty"
Expected Output:
epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1678886400 12:00:00 my-ha-cluster green 6 3 0 0 0 0 0 0 - 100.0%
status: Should begreen(all primary and replica shards are allocated) oryellow(all primary shards are allocated, but some replica shards are not yet).redindicates a serious problem.node.total: Should match the total number of nodes you started.node.data: Should match the number of data nodes.
Check nodes to ensure they've all joined the cluster:
curl -X GET "localhost:9200/_cat/nodes?v&pretty"
Expected Output (example for 3 master, 3 data nodes):
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
192.168.1.101 21 87 0 0.00 0.01 0.05 m * node-1
192.168.1.102 20 88 0 0.00 0.01 0.05 m - node-2
192.168.1.103 22 86 0 0.00 0.01 0.05 m - node-3
192.168.1.104 35 90 1 0.10 0.12 0.11 d - node-4
192.168.1.105 32 89 1 0.11 0.13 0.10 d - node-5
192.168.1.106 30 91 1 0.12 0.10 0.09 d - node-6
This shows node-1 as the elected master (* under master column) and other nodes as part of the cluster.
Step 7: Configure Index Sharding and Replication
For newly created indices, Elasticsearch defaults to one primary shard and one replica (index.number_of_shards: 1, index.number_of_replicas: 1). For HA, you typically want at least one replica, meaning your data exists on at least two different nodes. This ensures that if one node fails, a replica is available elsewhere.
When creating an index, specify these settings:
```bash
curl -X PUT "localhost:9200/my_ha_index?pretty" -H 'Content-Type: application/json' -d'
{
"settings": {
"index": {
"number_of_shards": 3