Elasticsearch Cluster Setup: A Step-by-Step Configuration Guide
Configure an Elasticsearch cluster with safe node roles, discovery settings, networking, heap sizing, and health checks.
Elasticsearch Cluster Setup: A Step-by-Step Configuration Guide
Setting up an Elasticsearch cluster is mostly about making a few early choices correctly: node names, discovery, roles, networking, and memory. If those basics are wrong, your cluster may form unreliably, expose itself on the wrong interface, or struggle during node restarts.
This guide walks through the core settings you should review before putting Elasticsearch behind real workloads. Examples use elasticsearch.yml and assume a small three-node cluster, but the same checks apply when you scale out.
Prerequisites
Before diving into the configuration, ensure you have the following in place:
- Java runtime: Many Elasticsearch distributions include a bundled JDK. If you provide your own Java runtime, check the support matrix for your Elasticsearch version.
java -version - System Resources: Allocate sufficient RAM, CPU, and disk space for your Elasticsearch nodes. The exact requirements depend on your data volume and query complexity.
- Network Access: Ensure nodes can communicate with each other on the configured transport ports (default is 9300).
Installation
While this guide focuses on configuration, a successful setup begins with a correct installation. Elasticsearch can be installed via package managers (apt, yum), by downloading the archive, or using Docker. Refer to the official Elasticsearch documentation for detailed installation instructions specific to your operating system or deployment method.
Core Configuration Files
The primary configuration file for Elasticsearch is elasticsearch.yml, typically located in the config/ directory of your Elasticsearch installation. Key settings within this file dictate cluster behavior.
Cluster Setup: Key Configuration Directives
1. Cluster Name (cluster.name)
This setting uniquely identifies your cluster. All nodes in the same cluster must share the same cluster.name. If not set, it defaults to elasticsearch.
- Importance: Essential for nodes to discover and join the correct cluster. Different clusters in the same network should have distinct names.
- Example (
elasticsearch.yml):cluster.name: my-production-cluster
2. Node Role (node.roles)
Elasticsearch nodes can be assigned specific roles to optimize resource allocation and performance. Common roles include master, data, ingest, and ml. For smaller clusters, a single node can have multiple roles.
- Master-eligible node: Responsible for cluster-wide actions like creating/deleting indices, tracking nodes, and allocating shards. It's recommended to have dedicated master nodes in production environments for stability.
node.roles: [ master ] - Data node: Stores data and performs data-related operations like indexing and searching. Dedicated data nodes are crucial for performance.
node.roles: [ data ] - Ingest node: Used for pre-processing documents before indexing (e.g., using ingest pipelines).
node.roles: [ ingest ] - Machine Learning node: Runs machine learning features for anomaly detection and other tasks.
node.roles: [ ml ] - Coordinating-only node: Handles search and bulk requests but does not store data or participate in master election. Useful for offloading heavy query loads from data or master nodes.
node.roles: []
Best Practice: In production, dedicate nodes to specific roles (e.g., separate master nodes from data nodes) for better fault tolerance and performance. For smaller setups, nodes can have combined roles.
3. Network Settings (network.host, http.port, transport.port)
These settings control how your Elasticsearch nodes communicate.
network.host: The IP address or hostname the node binds to. For multi-node clusters, set this to an IP address reachable by other nodes. Using0.0.0.0binds to all available network interfaces.network.host: 192.168.1.100 # or network.host: _site_ # or network.host: 0.0.0.0http.port: The port for the HTTP REST API (default: 9200).http.port: 9200transport.port: The port for node-to-node communication (default: 9300).transport.port: 9300
Warning: Be mindful of firewall rules to ensure nodes can communicate on the transport.port.
4. Discovery Settings (discovery.seed_hosts, cluster.initial_master_nodes)
These settings are crucial for nodes to find and join the cluster.
discovery.seed_hosts: A list of IP addresses or hostnames of other nodes in the cluster that new nodes can connect to discover the cluster.discovery.seed_hosts: - "host1:9300" - "host2:9300" - "192.168.1.101:9300"cluster.initial_master_nodes: A list of master-eligible node names used only to bootstrap a brand-new cluster. Remove this setting after the cluster has formed. Leaving stale bootstrap settings in place can cause confusion during later rebuilds or accidental cluster formation.cluster.initial_master_nodes: - "node-1" - "node-2" - "node-3"
Tip: In cloud environments or dynamic networks, consider using services like DNS or cloud provider discovery mechanisms.
Configuring a Multi-Node Cluster
To set up a multi-node cluster, you'll configure each node's elasticsearch.yml file. Ensure that:
cluster.nameis identical on all nodes.- Each node has a unique
node.name(e.g.,node-1,node-2). network.hostis set to an IP address reachable by other nodes.discovery.seed_hostslists the addresses of at least a quorum of master-eligible nodes.cluster.initial_master_nodesincludes the names of the initial master-eligible nodes only during first cluster bootstrap.
Example for node-1:
cluster.name: my-production-cluster
node.name: node-1
node.roles: [ master, data ]
network.host: 192.168.1.100
http.port: 9200
transport.port: 9300
discovery.seed_hosts:
- "192.168.1.100:9300"
- "192.168.1.101:9300"
- "192.168.1.102:9300"
cluster.initial_master_nodes:
- "node-1"
- "node-2"
- "node-3"
Example for node-2 (similar, with node.name: node-2):
cluster.name: my-production-cluster
node.name: node-2
node.roles: [ master, data ]
network.host: 192.168.1.101
http.port: 9200
transport.port: 9300
discovery.seed_hosts:
- "192.168.1.100:9300"
- "192.168.1.101:9300"
- "192.168.1.102:9300"
cluster.initial_master_nodes:
- "node-1"
- "node-2"
- "node-3"
5. Heap Size (jvm.options)
Elasticsearch uses a significant amount of memory. The Java Virtual Machine (JVM) heap size is configured in the jvm.options file (usually in the config/ directory). It's recommended to set the minimum and maximum heap size to the same value to avoid performance issues caused by heap resizing.
- Best Practice: Set the heap size to no more than about half of system RAM and leave memory for the filesystem cache. Avoid oversized heaps; many deployments stay below the compressed ordinary object pointer threshold, which is commonly around the low-30 GB range but depends on the JVM.
Example (jvm.options):
-Xms4g
-Xmx4g
This sets both the initial and maximum heap size to 4 gigabytes.
6. Shard Allocation and Replication (cluster.routing.*)
These settings control how shards are distributed and replicated across nodes.
cluster.routing.allocation.disk.watermark.low,high,flood_stage: Thresholds to prevent shard allocation on disks that are running out of space.cluster.routing.allocation.enable: Controls shard allocation (e.g.,all,primaries,new_primaries,none).
Example:
cluster.routing.allocation.disk.watermark.low: "85%"
cluster.routing.allocation.disk.watermark.high: "90%"
cluster.routing.allocation.disk.watermark.flood_stage: "95%"
Verifying Cluster Health
Once nodes are started, you can check the cluster's health and status using the Cluster Health API.
curl -X GET "localhost:9200/_cluster/health?pretty"
Key output fields:
status:green(all shards allocated),yellow(some replicas unassigned),red(some primary shards unassigned).number_of_nodes: The total number of nodes in the cluster.number_of_data_nodes: The number of nodes designated as data nodes.active_shards,relocating_shards,initializing_shards,unassigned_shards.
Tip: Aim for a green status. A yellow status indicates that while your data is safe (primary shards are allocated), you may lack sufficient replicas for high availability. A red status means data is at risk and requires immediate attention.
Next Steps
After successfully setting up your Elasticsearch cluster, you'll typically proceed to:
- Index Creation: Define how your data will be stored and organized.
- Mapping: Define the schema for your documents, specifying data types for fields.
- Analyzers: Configure text analysis for effective full-text search.
- Security: Implement authentication and authorization.
This guide provides the essential groundwork for a stable and performant Elasticsearch cluster. Continuous monitoring and tuning based on your specific workload are key to long-term success.