Elasticsearch Shard Allocation Issues: Causes and Solutions

Troubleshoot common Elasticsearch shard allocation failures leading to Yellow or Red cluster health. This guide explains critical causes, including disk space thresholds, node attribute mismatches, and primary shard loss. Learn to effectively use the Allocation Explain API and apply practical commands to restore cluster stability and ensure data availability.

43 views

Elasticsearch Shard Allocation Issues: Causes and Solutions

Elasticsearch clusters are designed for resilience and high availability, relying heavily on the proper distribution and allocation of shards across nodes. When a shard fails to move from an UNASSIGNED or INACTIVE state to an active primary or replica, the cluster health indicator often turns yellow or red. Understanding why shards aren't allocating is crucial for maintaining a healthy, performant, and available search engine.

This guide dives deep into the common causes of shard allocation failures—from insufficient cluster resources to misconfigured settings—and provides actionable, practical solutions to resolve these issues, ensuring your data is properly indexed and searchable.


Understanding Shard States and Allocation

Before troubleshooting, it is essential to know what Elasticsearch is trying to do. Shards are the fundamental unit of distribution in Elasticsearch. They can exist in several states:

  • STARTED: The shard is active and serving requests (Primary or Replica).
  • RELOCATING: The shard is being moved from one node to another (during rebalancing or node addition/removal).
  • INITIALIZING: A new replica shard is being created from the primary.
  • UNASSIGNED: The shard exists in the cluster state metadata but is not allocated to any node, often because the required node is unavailable or criteria are not met.

Cluster health is determined by the presence of unassigned shards:

  • Green: All primary and replica shards are allocated.
  • Yellow: All primary shards are allocated, but one or more replica shards are unassigned.
  • Red: One or more primary shards are unassigned (indicating data loss risk if the node hosting the primary shard fails).

Common Causes of Shard Allocation Failures

Shard allocation is managed by the cluster's allocation decider logic, which checks numerous factors before placing a shard. Failure usually stems from a violation of one of these decision points.

1. Insufficient Cluster Resources

This is perhaps the most frequent cause of allocation hangs, especially in dynamic environments.

Disk Space Thresholds

Elasticsearch automatically stops allocating new shards (both primary and replica) to a node if its disk usage crosses predefined thresholds. By default, allocation stops at 85% usage and prevents allocation entirely at 90%.

Default Thresholds (Check elasticsearch.yml or Cluster Settings):

Setting Default Value Description
cluster.routing.allocation.disk.watermark.low 85% Threshold where the node is considered relatively full.
cluster.routing.allocation.disk.watermark.high 90% Threshold where allocation to the node is blocked.
cluster.routing.allocation.disk.watermark.flood_stage 95% Allocation stops completely, and indexing/writing operations may fail.

Solution: Check disk usage on all nodes and either free up space or add more disk space to the nodes that are crossing the high watermark.

Memory and CPU Pressure

While less common than disk issues, persistent high memory usage or high CPU load on nodes can prevent new shards from being assigned, as the allocation deciders prefer nodes with sufficient operational headroom.

2. Node Roles and Attributes Mismatch

Modern Elasticsearch deployments often use dedicated master, ingest, or coordinating nodes. Shards will not allocate to nodes that do not meet the required criteria.

Mismatched Allocation Rules

If you have configured specific index settings requiring shards to be placed on nodes tagged with certain attributes (e.g., fast SSDs), but no available node matches that tag, the shards will remain unassigned.

Example: An index created with index.routing.allocation.require.box_type: high_io will only allocate on nodes explicitly configured with that setting.

Solution: Verify the allocation rules (allocation.require, allocation.include, allocation.exclude) for the affected index and ensure nodes possess the correct node.attr settings.

3. Cluster State Stability and Primary Allocation Failures (Red Health)

If primary shards are unassigned (cluster is RED), it means the node that held the last primary copy has failed or left the cluster, and no available replica shard can be promoted to primary.

Common Scenarios:
* A node holding the only primary copy crashes unexpectedly.
* The node containing the primary shard is explicitly removed from the cluster before replicas were successfully copied.

Solution: If the failed node cannot be recovered quickly, you may need to manually force allocation by overriding the primary block, but this carries a high risk of data loss for those specific shards.

4. Shard Limits and Quotas

Elasticsearch imposes limits to prevent runaway shard creation that could destabilize the cluster.

Max Shards Per Node

If a node has reached its configured maximum number of shards (cluster.routing.allocation.total_shards_per_node), no further shards will be assigned to it, even if disk space is available.

Solution: Increase the total_shards_per_node limit (use caution, as too many shards per node can degrade performance) or add more nodes to the cluster to distribute the load.

Diagnosing Allocation Failures: The Allocation Explain API

The Allocation Explain API is the single most powerful tool for diagnosing why a specific shard is not allocating. It simulates the decision-making process of the allocation deciders.

To use it, you need the index name, shard number, and the node the shard should be on (if known, or omit the node to check all possibilities).

Example Usage (Checking Shard 0 of Index my_data):

GET /_cluster/allocation/explain?pretty
{
  "index": "my_data",
  "shard": 0,
  "primary": true
}

The response will detail every allocation decision made for that shard, explicitly stating which rule was violated (e.g., "[disk exceeding high watermark on node X]").

Reading the Output

Pay close attention to the explanation field and the deciders section. If a decider returns false, the corresponding message explains the constraint being violated (e.g., disk usage, replica count mismatch, or node attribute exclusion).

Troubleshooting Steps and Commands

When faced with an UNASSIGNED state, follow this prioritized troubleshooting sequence:

Step 1: Check Cluster Health and Unassigned Shards

First, see the big picture.

GET /_cluster/health?pretty
GET /_cat/shards?h=index,shard,prirep,state,unassigned.reason,node

Look specifically at the unassigned.reason column from the cat API output. This often provides immediate clues (e.g., CLUSTER_RECOVERED, NODE_LEFT, INDEX_CREATED).

Step 2: Investigate Disk Space

If the reason points to disk pressure, check the actual usage across all nodes.

GET /_cat/allocation?v&h=node,disk.used_percent,disk.avail,disk.total

Action: If nodes are near 90% capacity, immediately start clearing logs, shrinking index retention, or adding disk capacity to those nodes.

Step 3: Use Allocation Explain for Complex Cases

If the cause isn't obvious resource pressure, run the Allocation Explain API as detailed above to pinpoint configuration mismatches.

Step 4: Manually Forcing Allocation (Use with Caution)

If a primary shard is UNASSIGNED (Red Health) because the original node is permanently gone, and you accept the risk of losing data written since the last primary shard existed, you can force the cluster to promote a replica (if one exists).

Warning: This command permanently deletes the unassigned primary shard record. Only use this if you cannot recover the node hosting the primary.

POST /_cluster/reroute
{
  "commands": [
    {
      "allocate_stale_primary": {
        "index": "index_name",
        "shard": 0,
        "node": "node_name_with_replica",
        "accept_data_loss": true
      }
    }
  ]
}

Step 5: Addressing Stuck Replicas (Yellow Health)

If only replicas are unassigned (Yellow Health) due to insufficient nodes or disk space, simply fixing the underlying resource constraint (adding nodes or clearing disk space) will usually cause the replicas to allocate automatically once the deciders permit it.

If you must proceed without adding resources, you can temporarily disable replica allocation for the index:

PUT /my_index/_settings
{
  "index.blocks.write": true, 
  "index.number_of_replicas": 0
}

After this change, the cluster health should turn Green (as zero replicas are now successfully allocated). Remember to re-enable replicas later ("index.number_of_replicas": 1).

Best Practices for Preventing Allocation Issues

  1. Monitor Disk Watermarks Rigorously: Set up alerts based on the high watermark (90%) to intervene before allocation is fully blocked.
  2. Maintain Node Diversity: Ensure you have enough physical or virtual nodes such that if one fails, there are still enough available nodes that meet attribute requirements to host all primaries and required replicas.
  3. Use Allocation Awareness: For multi-zone or multi-rack deployments, configure cluster.routing.allocation.awareness.attributes to prevent all copies of a shard from landing on the same physical zone, mitigating zone-wide outages.
  4. Set Realistic Replica Counts: Avoid setting replica counts higher than the number of physical nodes you can sustain, as this guarantees unassigned replicas during minor maintenance.

By proactively managing resources and utilizing the Allocation Explain API, administrators can quickly diagnose and resolve the factors preventing Elasticsearch shards from achieving optimal allocation.