Understanding Elasticsearch Master Node Election and Quorum Requirements
Elasticsearch is a distributed system, relying on coordination among nodes to maintain a consistent view of the cluster state. At the heart of this coordination lies the Master Node. The master node is the single source of truth for cluster metadata, and ensuring its stability and proper election is paramount for cluster health, scalability, and resilience.
This article details the critical responsibilities of the master node, explains the modern election process used in recent Elasticsearch versions (7.x+), and clarifies the essential concept of quorum—the mechanism necessary to prevent the devastating scenario known as the split-brain problem.
The Critical Role of the Master Node
While data nodes handle the heavy lifting of indexing, searching, and storing data, the master node is responsible for managing the structure and metadata of the entire cluster. It does not typically participate in query or indexing operations unless it is also configured as a data node.
Master Node Responsibilities
- Cluster State Management: The master maintains and publishes the cluster state—a blueprint of the cluster's current configuration, including which indices exist, the mappings and settings for those indices, and the location of every shard.
- Node Management: Handling the joining and leaving of nodes, updating the cluster state accordingly.
- Index Management: Creating, deleting, or updating indices.
- Shard Allocation: Deciding where primary and replica shards should reside (initial allocation and rebalancing after node failure).
If the master node fails, the cluster cannot perform administrative tasks or reallocate shards until a new master is successfully elected.
Understanding Master Election and Voting
In a distributed system, an election process is required whenever the current master node fails or becomes unreachable. Since Elasticsearch 7.0, the election mechanism has been significantly simplified and hardened, primarily through the elimination of the complex discovery.zen.minimum_master_nodes setting and the introduction of self-managed Voting Configurations.
The Election Process (Elasticsearch 7.x+)
Master election is now handled automatically by the master-eligible nodes, which are defined in the configuration using node.roles: [master, data], or just node.roles: [master] for dedicated masters.
- Candidate Discovery: Master-eligible nodes communicate to determine the set of active voting members.
- Quorum Check: Nodes check if they can reach a quorum—a majority of the known voting nodes—to ensure consensus.
- Leader Selection: If a quorum is established, the highest-ranking candidate (based on a tie-breaking mechanism like cluster state ID) is proposed as the new master.
- Voting and Commitment: The proposal is voted on, and if accepted by the majority, the new master takes control and publishes the new cluster state.
Initial Cluster Bootstrapping
When starting a brand new cluster for the first time, Elasticsearch needs to know which nodes should participate in the initial voting configuration. This is handled using the cluster.initial_master_nodes setting. This setting should only be used once during the initial startup of the cluster.
# elasticsearch.yml snippet for initial setup
cluster.name: my-production-cluster
node.name: node-1
node.roles: [master, data]
# List the names of all master-eligible nodes used for the initial bootstrap
cluster.initial_master_nodes: [node-1, node-2, node-3]
Tip: Once the cluster is running and stable, you should remove or comment out the
cluster.initial_master_nodessetting from the configuration files of all nodes to avoid potential issues if nodes are restarted later in a mixed state.
Quorum Requirements and Split-Brain Prevention
The fundamental reason for quorum requirements is to guarantee that only one leader can be elected at any time, thereby preventing the split-brain problem.
What is Split-Brain?
Split-brain occurs when a network partition divides the cluster into two or more isolated segments, and each segment believes it is the authoritative master. If this happens, nodes in different segments may independently accept indexing requests and allocate shards, leading to data inconsistency and corruption when the network eventually heals.
The Quorum Rule (Majority Consensus)
To prevent split-brain, Elasticsearch enforces a majority consensus rule, requiring a minimum number of voting nodes to agree on any cluster state change. This minimum is the quorum, calculated as:
$$\text{Quorum} = \lfloor (\text{Number of Voting Nodes} / 2) \rfloor + 1$$
By requiring a strict majority, if the network partitions, only the larger side (which holds the majority) can reach the quorum and continue operating. The smaller side, unable to elect a master, will halt and wait for network connectivity to restore, thus avoiding data writes to the partitioned segment.
| Number of Voting Master Nodes (N) | Required Quorum (N/2 + 1) |
|---|---|
| 3 | 2 |
| 5 | 3 |
| 7 | 4 |
Best Practice Warning: Always deploy an odd number of master-eligible nodes (e.g., three or five). Deploying an even number (e.g., four) offers the same fault tolerance as the preceding odd number (three), but requires more resources. For instance, a 4-node voting cluster requires 3 votes (N/2+1), meaning it can only tolerate one failure, same as a 3-node cluster, but uses one extra server.
Configuring Dedicated Master Nodes
For production environments, especially large clusters (20+ data nodes), it is highly recommended to use dedicated master nodes. This separates the resource-intensive tasks of searching/indexing from the crucial administrative duties of the master.
Node Configuration Example (Dedicated Master)
To configure a node to be master-eligible but not store data or run ingest pipelines, use the following roles in elasticsearch.yml:
# Node 1: Dedicated Master
node.name: es-master-01
node.roles: [master]
# Disable HTTP/Transport traffic for pure masters (optional, but good security practice)
# http.enabled: false
# transport.bind_host: [private_ip_of_master]
Node Configuration Example (Dedicated Data Node)
Conversely, a dedicated data node should be prevented from participating in the master election process:
# Node 4: Dedicated Data Node
node.name: es-data-04
node.roles: [data]
# Note: If no roles are specified, Elasticsearch defaults to [master, data, ingest] (pre-8.0 default)
Cluster Discovery Settings
All nodes must be configured to find the same set of master-eligible nodes using the discovery.seed_hosts setting. This setting lists the network addresses where Elasticsearch can attempt to contact other nodes to join the cluster.
# Common setting for all nodes in the cluster
discovery.seed_hosts: ["10.0.0.1:9300", "10.0.0.2:9300", "10.0.0.3:9300"]
This list should contain the addresses of the master-eligible nodes (es-master-01, es-master-02, es-master-03, etc.).
Troubleshooting Election Issues
If the cluster fails to elect a master, it typically enters a 'red' or 'yellow' state and logs persistent errors. Common causes include:
| Issue | Description & Solution |
|---|---|
| Network Issues | Nodes cannot communicate with each other due to firewall rules, routing issues, or high latency. Ensure ports 9200 (HTTP) and 9300 (Transport) are open between nodes. |
| Configuration Mismatch | cluster.name is incorrect or discovery.seed_hosts does not point to the correct master-eligible nodes. Verify all nodes use identical settings. |
| Quorum Loss | Too many master-eligible nodes have failed simultaneously (e.g., two failures in a 3-node master setup). Manual intervention (e.g., using the api/cluster/decommission/voting_config_exclusions API) may be required to forcibly remove failed nodes from the voting list. |
| Disk I/O | The master node's disk I/O is saturated, preventing it from publishing the cluster state quickly, leading to timeouts and repeated elections. |
Checking the Voting Configuration
You can inspect the current voting configuration using the Cluster API:
GET /_cluster/state?filter_path=metadata.cluster_coordination.voting_config_excluding_deferred
This output confirms which nodes are currently counted toward the quorum, ensuring your configuration matches your fault tolerance goals.
Summary
The master node election process is the backbone of an Elasticsearch cluster's resilience. By understanding the responsibilities of the master and correctly implementing the quorum rule (using an odd number of master-eligible nodes and ensuring correct cluster.initial_master_nodes during bootstrap), administrators can reliably prevent split-brain scenarios and maintain a highly available distributed system. Always use dedicated master nodes in production to isolate administrative tasks and ensure reliable cluster state publication.