Elasticsearch Shard Allocation Issues: Causes and Solutions

Elasticsearch shard allocation issues usually show up as yellow or red cluster health. Yellow means your primary shards are assigned but at least one replica is not. Red means at least one primary shard is unassigned, so some data may be unavailable until you recover it.

This guide shows you how to find the allocation blocker, read the Allocation Explain API output, and choose the least risky fix. The goal is to restore allocation without making data loss worse.

Understanding Shard States and Cluster Health

Shards are the unit Elasticsearch places across data nodes. They can exist in several states:

STARTED: The shard is active and serving requests.
RELOCATING: The shard is moving from one node to another.
INITIALIZING: The shard is being created or recovered.
UNASSIGNED: The shard exists in cluster metadata but is not allocated to a node.

Cluster health follows those shard states:

Green: All primary and replica shards are allocated.
Yellow: All primary shards are allocated, but one or more replicas are unassigned.
Red: One or more primary shards are unassigned. Searches may return partial results or fail for affected indices, and writes to those indices may fail.

Common Causes of Shard Allocation Failures

Elasticsearch uses allocation deciders before placing a shard. A single NO decision can keep a shard unassigned.

Disk Watermarks

Disk pressure is one of the most common causes. Elasticsearch uses disk watermarks to avoid filling a node. Once a node crosses the low or high watermark, allocation decisions become more restrictive. At the flood-stage watermark, Elasticsearch can add a read-only block to affected indices to protect the node from running out of disk.

Setting	Common Default	Effect
`cluster.routing.allocation.disk.watermark.low`	85%	Avoids allocating additional shards to nodes above this threshold.
`cluster.routing.allocation.disk.watermark.high`	90%	Tries to move shards away and avoids placing shards on the node.
`cluster.routing.allocation.disk.watermark.flood_stage`	95%	Can block writes on affected indices.

Confirm your cluster's actual settings before changing anything:

GET /_cluster/settings?include_defaults=true&filter_path=**.disk.watermark*

Then check node disk usage:

GET /_cat/allocation?v&h=node,disk.used_percent,disk.avail,disk.total,shards

Free space, add disk, add data nodes, delete old indices, or reduce replica pressure. If a flood-stage block was set, remove it only after disk pressure is fixed:

PUT /my_index/_settings
{
  "index.blocks.read_only_allow_delete": null
}

Node Roles and Allocation Filters

Index shards allocate only to nodes with a data role and matching allocation filters. If you use node attributes for hot/warm tiers, racks, zones, or storage types, a typo can strand shards.

For example, an index with index.routing.allocation.require.box_type: high_io will only allocate on nodes configured with node.attr.box_type: high_io.

Check index filters and node attributes:

GET /my_index/_settings?filter_path=*.settings.index.routing.allocation
GET /_cat/nodeattrs?v
GET /_cat/nodes?v&h=name,roles,disk.used_percent

Fix the index setting or add an eligible data node. Do not remove allocation awareness casually in multi-zone clusters; it may place all copies of a shard in the same failure domain.

Missing Primary Shards

If a primary shard is unassigned, the node that held the active primary may be gone, the index may have just been restored, or allocation rules may be blocking every eligible node. Do not assume data is lost until the Allocation Explain API tells you why Elasticsearch cannot allocate the shard.

Common scenarios include:

A node holding the only good primary copy crashed.
Allocation filters exclude every data node that could host the primary.
A snapshot restore or index creation is waiting for eligible nodes.
A stale shard copy exists, but Elasticsearch will not promote it without explicit acceptance of data loss.

First try to recover the missing node, restore a snapshot, or fix the allocation blocker. Use forced primary allocation only when you understand which copy is stale or when you have accepted data loss for that shard.

Shard Limits

Shard-per-node limits can also block allocation. Common settings include index.routing.allocation.total_shards_per_node and cluster.routing.allocation.total_shards_per_node.

Check for those limits:

GET /_cluster/settings?include_defaults=true&filter_path=**.total_shards_per_node
GET /my_index/_settings?filter_path=*.settings.index.routing.allocation.total_shards_per_node

Add nodes, reduce replica count, consolidate small indices, or cautiously raise the relevant limit. Too many shards per node can increase heap pressure and slow cluster-state operations.

Diagnosing with the Allocation Explain API

The Allocation Explain API is the best tool for answering "why is this shard not allocating?"

GET /_cluster/allocation/explain?pretty
{
  "index": "my_data",
  "shard": 0,
  "primary": true
}

To let Elasticsearch pick one currently unassigned shard, call the API with no body:

GET /_cluster/allocation/explain?pretty

Read these fields first:

can_allocate: The high-level answer.
allocate_explanation: The plain-English summary.
node_allocation_decisions: Per-node decisions.
deciders: The exact rule that returned NO or THROTTLE.

A NO decision is the blocker. A THROTTLE decision usually means Elasticsearch can allocate the shard but is limiting concurrent recovery work.

Safe Troubleshooting Sequence

Start broad, then narrow down.

1. Check Cluster Health and Unassigned Shards

GET /_cluster/health?pretty
GET /_cat/shards?v&h=index,shard,prirep,state,unassigned.reason,node

Look at unassigned.reason. Values such as NODE_LEFT, INDEX_CREATED, CLUSTER_RECOVERED, or ALLOCATION_FAILED tell you where to look next.

2. Check Disk and Node Eligibility

GET /_cat/allocation?v&h=node,disk.used_percent,disk.avail,disk.total
GET /_cat/nodes?v&h=name,roles,heap.percent,ram.percent,cpu,disk.used_percent

If nodes are near the high watermark, fix disk pressure before changing allocation settings.

3. Run Allocation Explain

Use the affected index, shard number, and primary/replica flag. The output should name the setting, node condition, or decider that blocks allocation.

4. Avoid Risky Reroutes Until You Know the Cause

Manual reroute commands are for specific recovery cases. They are not a general fix for disk pressure, bad filters, or too many replicas.

If a stale primary copy is the only practical recovery path, the command looks like this:

POST /_cluster/reroute
{
  "commands": [
    {
      "allocate_stale_primary": {
        "index": "index_name",
        "shard": 0,
        "node": "node_name_with_stale_copy",
        "accept_data_loss": true
      }
    }
  ]
}

accept_data_loss: true is required for a reason. Use it only after you have checked snapshots, tried to recover the missing node, and confirmed which node holds the stale copy.

5. Handle Yellow Health Separately

If only replicas are unassigned, the cluster can still serve primary data. Fix the underlying resource constraint first. Adding a data node, clearing disk, or correcting allocation filters usually lets Elasticsearch assign replicas automatically.

If you must run temporarily without replicas, reduce the replica count for the affected index:

PUT /my_index/_settings
{
  "index.number_of_replicas": 0
}

This can make health turn green because Elasticsearch no longer expects replica copies for that index. It also reduces availability, so set replicas back to the desired value after you add capacity or fix allocation.

Preventing Allocation Issues

Alert before nodes cross the high disk watermark.
Keep enough data nodes available for your replica count and allocation awareness rules.
Use shard counts that fit your heap, data volume, and recovery targets.
Review index templates so new indices do not inherit bad replica counts or allocation filters.
Test node replacement and snapshot restore steps before an incident.

Takeaway

Your safest path is simple: identify the unassigned shard, run Allocation Explain, fix the decider that says NO, and avoid forced allocation unless you have accepted the data-loss tradeoff.