Understanding Elasticsearch Master Node Election and Quorum Requirements

How Elasticsearch master elections and quorum work, with practical guidance for avoiding split-brain and unsafe bootstrap settings.

Understanding Elasticsearch Master Node Election and Quorum Requirements

Elasticsearch master node quorum decides whether your cluster can safely elect a leader, publish cluster-state changes, and avoid two disconnected sides of the same cluster making different decisions.

The elected master does not make searches faster by itself. It coordinates the cluster. When elections are unstable, the symptoms can look scattered: index creation hangs, shard allocation stops, Kibana reports changing health states, and logs repeat messages about a master not being discovered.

The Critical Role of the Master Node

While data nodes handle the heavy lifting of indexing, searching, and storing data, the master node is responsible for managing the structure and metadata of the entire cluster. It does not typically participate in query or indexing operations unless it is also configured as a data node.

Master Node Responsibilities

  1. Cluster State Management: The master maintains and publishes the cluster state—a blueprint of the cluster's current configuration, including which indices exist, the mappings and settings for those indices, and the location of every shard.
  2. Node Management: Handling the joining and leaving of nodes, updating the cluster state accordingly.
  3. Index Management: Creating, deleting, or updating indices.
  4. Shard Allocation: Deciding where primary and replica shards should reside (initial allocation and rebalancing after node failure).

If the elected master fails, the cluster pauses master-only work until another master is elected. Existing searches may continue for available shards, but index creation, mapping updates, and allocation decisions depend on stable master coordination.

Understanding Master Election and Voting

In a distributed system, an election process is required whenever the current master node fails or becomes unreachable. Since Elasticsearch 7.0, the election mechanism has been significantly simplified and hardened, primarily through the elimination of the complex discovery.zen.minimum_master_nodes setting and the introduction of self-managed Voting Configurations.

The Election Process (Elasticsearch 7.x+)

Master election is now handled automatically by the master-eligible nodes, which are defined in the configuration using node.roles: [master, data], or just node.roles: [master] for dedicated masters.

  1. Candidate Discovery: Master-eligible nodes communicate to determine the set of active voting members.
  2. Quorum Check: Nodes check if they can reach a quorum—a majority of the known voting nodes—to ensure consensus.
  3. Leader Selection: If a quorum is established, Elasticsearch's coordination subsystem selects a master according to its internal election rules.
  4. Voting and Commitment: The proposal is voted on, and if accepted by the majority, the new master takes control and publishes the new cluster state.

Initial Cluster Bootstrapping

When starting a brand new cluster for the first time, Elasticsearch needs to know which nodes should participate in the initial voting configuration. This is handled using the cluster.initial_master_nodes setting. This setting should only be used once during the initial startup of the cluster.

# elasticsearch.yml snippet for initial setup
cluster.name: my-production-cluster
node.name: node-1
node.roles: [master, data]

# List the names of all master-eligible nodes used for the initial bootstrap
cluster.initial_master_nodes: [node-1, node-2, node-3]

Tip: Once the cluster has formed, remove cluster.initial_master_nodes from every node. Leaving it in place can be dangerous during later restarts because this setting is only meant for the first bootstrap of a brand new cluster.

Quorum Requirements and Split-Brain Prevention

The fundamental reason for quorum requirements is to guarantee that only one leader can be elected at any time, thereby preventing the split-brain problem.

What is Split-Brain?

Split-brain occurs when a network partition divides the cluster into isolated segments, and more than one segment believes it has the authoritative master. If that happens, different sides can accept conflicting cluster-state changes, which is exactly what quorum is designed to prevent.

The Quorum Rule (Majority Consensus)

To prevent split-brain, Elasticsearch enforces a majority consensus rule, requiring a minimum number of voting nodes to agree on any cluster state change. This minimum is the quorum, calculated as:

$$\text{Quorum} = \lfloor (\text{Number of Voting Nodes} / 2) \rfloor + 1$$

By requiring a strict majority, if the network partitions, only the larger side (which holds the majority) can reach the quorum and continue operating. The smaller side, unable to elect a master, will halt and wait for network connectivity to restore, thus avoiding data writes to the partitioned segment.

Number of Voting Master Nodes (N) Required Quorum (N/2 + 1)
3 2
5 3
7 4

Best Practice Warning: Always deploy an odd number of master-eligible nodes (e.g., three or five). Deploying an even number (e.g., four) offers the same fault tolerance as the preceding odd number (three), but requires more resources. For instance, a 4-node voting cluster requires 3 votes (N/2+1), meaning it can only tolerate one failure, same as a 3-node cluster, but uses one extra server.

Configuring Dedicated Master Nodes

For production environments, three dedicated master-eligible nodes are the common baseline. This separates search and indexing load from coordination work. Small development clusters can run mixed roles, but a cluster that matters should not let a heavy aggregation or ingest spike starve the elected master.

Node Configuration Example (Dedicated Master)

To configure a node to be master-eligible but not store data or run ingest pipelines, use the following roles in elasticsearch.yml:

# Node 1: Dedicated Master
node.name: es-master-01
node.roles: [master]

# Bind transport to a private network and restrict access with firewalls/security groups.
# network.host: 10.0.0.1

Node Configuration Example (Dedicated Data Node)

Conversely, a dedicated data node should be prevented from participating in the master election process:

# Node 4: Dedicated Data Node
node.name: es-data-04
node.roles: [data]

# If no roles are specified, Elasticsearch assigns the default role set for that version.

Cluster Discovery Settings

All nodes must be configured to find the same set of master-eligible nodes using the discovery.seed_hosts setting. This setting lists the network addresses where Elasticsearch can attempt to contact other nodes to join the cluster.

# Common setting for all nodes in the cluster
discovery.seed_hosts: ["10.0.0.1:9300", "10.0.0.2:9300", "10.0.0.3:9300"]

This list should contain the addresses of the master-eligible nodes (es-master-01, es-master-02, es-master-03, etc.).

Troubleshooting Election Issues

If the cluster fails to elect a master, it typically enters a 'red' or 'yellow' state and logs persistent errors. Common causes include:

Issue Description & Solution
Network Issues Nodes cannot communicate because of firewall rules, routing issues, DNS problems, or high latency. The transport port, commonly 9300, must be reachable between nodes. HTTP, commonly 9200, is for client/API access and is not the election channel.
Configuration Mismatch cluster.name is incorrect or discovery.seed_hosts does not point to the correct master-eligible nodes. Verify all nodes use identical settings.
Quorum Loss Too many voting nodes have failed at once, such as two failures in a three-master setup. If the missing nodes are gone permanently, use the voting configuration exclusions API carefully and only after confirming the failure mode.
Disk I/O The master node's disk I/O is saturated, preventing it from publishing the cluster state quickly, leading to timeouts and repeated elections.

Checking the Voting Configuration

You can inspect the current voting configuration using the Cluster API:

GET /_cluster/state?filter_path=metadata.cluster_coordination

This output confirms which nodes are currently counted toward the quorum, ensuring your configuration matches your fault tolerance goals.

The safest production pattern is boring: three dedicated master-eligible nodes in separate failure domains, stable transport networking, correct seed hosts, and cluster.initial_master_nodes used only once. When elections fail, resist the urge to restart every node at once. Read the logs, confirm which master-eligible nodes can see each other, and make one controlled change at a time.