Optimizing Slow Elasticsearch Queries: Best Practices for Performance Tuning
Diagnose and improve slow Elasticsearch queries with better query shape, pagination, caching, mappings, and the Profile API.
Optimizing Slow Elasticsearch Queries: Best Practices for Performance Tuning
Slow Elasticsearch queries usually come from one of four places: the query asks for too much, the mapping makes the query expensive, the cluster is short on resources, or the application repeats costly searches that should be cached or redesigned. The fix depends on which one is true.
Before rewriting everything, capture a real slow request with its index, filters, sort, aggregations, page depth, response size, and timing. A dashboard aggregation, an autocomplete query, and an export job all stress Elasticsearch differently.
Understanding Query Performance Bottlenecks
Before diving into solutions, it's helpful to understand common reasons behind slow Elasticsearch queries. These often include:
- Complex Queries: Queries with multiple
boolclauses, nested queries, or expensive operations likewildcardorregexpon large datasets. - Inefficient Data Retrieval: Fetching
_sourceunnecessarily, or retrieving large numbers of documents for pagination. - Resource Constraints: Insufficient CPU, memory, or disk I/O on data nodes.
- Suboptimal Mappings: Using incorrect data types or not leveraging
doc_valuesfor aggregations. - Shard Imbalance or Overload: Too many shards, too few shards, or uneven distribution of shards/data.
- Cache Misses or Bad Cache Fit: Repeating expensive searches without using request caching, filter context, or application-level caching where appropriate.
Optimizing Query Structure
The way you construct your queries has a profound impact on their performance. Small changes can lead to significant improvements.
1. Retrieve Only Necessary Fields (_source Filtering & stored_fields)
By default, Elasticsearch returns the entire _source field for each matching document. If your documents are large and the UI only needs a title, ID, and timestamp, fetching the whole document wastes network bandwidth and parsing time.
_sourceFiltering: Use the_sourceparameter to specify an array of fields to include or exclude.GET /my-index/_search { "_source": ["title", "author", "publish_date"], "query": { "match": { "content": "Elasticsearch performance" } } }stored_fields: If you've explicitly stored specific fields in your mapping ("store": true), you can retrieve them withstored_fields. Most deployments do not store many fields this way, so_sourcefiltering is the more common fix.GET /my-index/_search { "stored_fields": ["title", "author"], "query": { "match": { "content": "Elasticsearch performance" } } }
2. Prefer Efficient Query Types
Some query types are inherently more resource-intensive than others.
Avoid Leading Wildcards and Broad Regexps:
wildcardandregexpqueries can be expensive, especially with leading wildcards such as*test. Prefix queries are usually more manageable than leading wildcard searches, but they still need sensible mappings and bounded input.# Inefficient - avoid leading wildcard { "query": { "wildcard": { "name.keyword": { "value": "*search" } } } } # Better - if you know the prefix { "query": { "prefix": { "name.keyword": { "value": "Elastic" } } } }Use
match_phrasefor phrase intent: If the user is searching for an exact phrase,match_phraseexpresses that intent better than several unrelatedmatchclauses. It is not always cheaper, but it avoids returning documents that only contain the words far apart.Filter context for yes/no conditions: When you only care whether a document matches a condition, put that condition in
filtercontext or useconstant_score. This avoids unnecessary scoring work and is more cache-friendly.GET /my-index/_search { "query": { "constant_score": { "filter": { "term": { "status": "active" } } } } }
3. Optimize Boolean Queries
- Use filters for structured constraints: Put tenant IDs, status values, date ranges, and exact tags in
filter, notmust, unless they need scoring. Elasticsearch can reorder and optimize clauses internally, so do not rely on JSON order as your main performance tool. - Use
minimum_should_matchintentionally: It can improve relevance and reduce broad matches, but setting it too high can hide valid results.
4. Efficient Pagination (search_after and scroll)
Traditional from/size pagination becomes very inefficient for deep pages (e.g., from: 10000, size: 10). Elasticsearch has to retrieve and sort all documents up to from + size on each shard, then discard from documents.
search_after: For real-time deep pagination,search_afteris recommended. It uses the sort order of the previous page's last document to find the next set of results, similar to cursors in traditional databases. It's stateless and scales better.# First request GET /my-index/_search { "size": 10, "query": {"match_all": {}}, "sort": [{"timestamp": "asc"}, {"_id": "asc"}] } # Subsequent request using the sort values of the last document from the first request GET /my-index/_search { "size": 10, "query": {"match_all": {}}, "search_after": [1678886400000, "doc_id_XYZ"], "sort": [{"timestamp": "asc"}, {"_id": "asc"}] }scrollAPI: For bulk retrieval of large datasets, such as reindexing or exports,scrollcan still be useful. For newer Elasticsearch versions and long-running full-index scans, also consider point-in-time plussearch_after. Scroll is not suitable for user-facing real-time pagination.
5. Optimizing Aggregations
Aggregations can be resource-intensive, especially on high-cardinality fields.
- Pre-computing Aggregations: Consider running complex, non-real-time aggregations during indexing or on a schedule to pre-compute results and store them in a separate index.
doc_values: Ensure fields used in aggregations havedoc_valuesenabled (which is the default for most non-text fields). This allows Elasticsearch to load data for aggregations efficiently without loading_source.eager_global_ordinals: Forkeywordfields frequently used intermsaggregations, settingeager_global_ordinals: truein the mapping can improve performance by pre-building global ordinals. This incurs a cost at index refresh time but speeds up query time aggregations.
Leveraging Caching Techniques
Elasticsearch offers several layers of caching that can significantly speed up repeated queries.
1. Node Query Cache
- Mechanism: Caches the results of filter clauses within
boolqueries that are used frequently. It's an in-memory cache at the node level. - Effectiveness: Most effective for repeated filter clauses. Do not count on it for every query; Elasticsearch decides what is worth caching.
- Configuration: Enabled by default. You can control its size with
indices.queries.cache.size(default 10% of heap).
2. Shard Request Cache
Mechanism: Caches shard-level search results, most commonly for aggregation-heavy requests with
size=0. It is a strong fit for repeated dashboard queries over data that is not changing every second.Effectiveness: Excellent for dashboard queries or analytical applications where the same request (including aggregations) is executed repeatedly with identical parameters.
How to use: Enable it explicitly in your query using
"request_cache": true.GET /my-index/_search?request_cache=true { "size": 0, "query": { "bool": { "filter": [ {"term": {"status.keyword": "active"}}, {"range": {"timestamp": {"gte": "now-1h"}}} ] } }, "aggs": { "messages_per_minute": { "date_histogram": { "field": "timestamp", "fixed_interval": "1m" } } } }Caveats: The cache is invalidated whenever a shard is refreshed (new documents are indexed or existing ones updated). Only useful for queries that return identical results frequently.
3. Filesystem Cache (OS-level)
- Mechanism: The operating system's filesystem cache plays a critical role. Elasticsearch relies heavily on it to cache frequently accessed index segments.
- Effectiveness: Crucial for query performance. If index segments are in RAM, disk I/O is bypassed entirely, leading to much faster query execution.
- Best Practice: Leave substantial RAM for the filesystem cache. A common starting point is to keep JVM heap around half of system memory, with the usual Elasticsearch heap limits in mind, then validate with your workload.
4. Application-Level Caching
- Mechanism: Implementing a cache at your application layer (e.g., using Redis, Memcached, or an in-memory cache) for frequently requested search results.
- Effectiveness: Can provide the fastest response times by completely bypassing Elasticsearch for repeat requests. Best for static or slowly changing search results.
- Considerations: Cache invalidation strategy is key. Requires careful design to ensure data consistency.
Using the Profile API for Bottleneck Identification
The Profile API is an invaluable tool for understanding exactly how Elasticsearch executes a query and where time is spent. It breaks down the execution time for each component of your query and aggregation.
How to Use the Profile API
Simply add "profile": true to your search request body.
GET /my-index/_search
{
"profile": true,
"query": {
"bool": {
"must": [
{"match": {"title": "Elasticsearch"}},
{"term": {"status.keyword": "published"}}
],
"filter": [
{"range": {"publish_date": {"gte": "2023-01-01"}}}
]
}
},
"aggs": {
"top_authors": {
"terms": {
"field": "author.keyword",
"size": 10
}
}
}
}
Interpreting Profile API Results
The response will include a profile section detailing query and aggregation execution on each shard. Key metrics to look for include:
description: The specific query or aggregation component.time_in_nanos: The time spent executing this component.breakdown: Detailed sub-metrics likebuild_scorer_time,collect_time,set_weight_timefor queries, andreduce_timefor aggregations.children: Nested components, showing how time is distributed within complex queries.
Example Interpretation:
If you see a high time_in_nanos for a WildcardQuery, it confirms that this is an expensive part of your query. If collect_time is high, it suggests that retrieving and processing documents after a match is a bottleneck, possibly due to _source parsing or deep pagination. High reduce_time in aggregations might indicate a heavy load during the final merge phase.
By examining these metrics, you can pinpoint specific query clauses or aggregation fields that are consuming the most resources and then apply the optimization techniques discussed earlier.
General Best Practices for Performance
Beyond query-specific optimizations, several cluster-wide and index-level best practices contribute to overall search performance.
1. Optimal Index Mappings
textvs.keyword: Usetextfor full-text search andkeywordfor exact-value matching, sorting, and aggregations. Mismatched types can lead to inefficient queries.doc_values: Ensuredoc_valuesare enabled for fields you intend to sort or aggregate on. They are enabled by default for most field types that support sorting and aggregations, such askeyword, numeric, date, boolean, and IP fields. Plaintextfields are for full-text search; use akeywordsubfield when you need exact matching or aggregation.norms: Disablenorms("norms": false) for fields where you don't need document length normalization (e.g., ID fields). This saves disk space and improves indexing speed, with minimal impact on query performance for non-scoring queries.index_options: Fortextfields, useindex_options: docsif you only need to know if a term exists in a document, andindex_options: positions(the default) if you need phrase queries and proximity searches.
2. Monitor Cluster Health and Resources
- Cluster Status: Green is the target. Yellow means one or more replica shards are unassigned; searches can still work, but resilience is reduced and performance may suffer. Red means primary shards are missing and some data is unavailable.
- Resource Monitoring: Regularly monitor CPU, RAM, disk I/O, and network usage on your data nodes. Spikes in these metrics often correlate with slow queries.
- JVM Heap: Keep an eye on JVM heap usage. High utilization can lead to frequent garbage collection pauses, making queries slow. Optimize queries to reduce heap pressure.
3. Proper Shard Allocation
- Too Many Shards: Each shard consumes resources. Many small shards create overhead. Shards in the tens of gigabytes are common, but the right size depends on heap, query pattern, recovery targets, and hardware.
- Too Few Shards: Limits parallelism. Queries against an index with too few shards won't be able to leverage all available data nodes efficiently.
4. Indexing Strategy
- Refresh Interval: A lower
refresh_interval(default 1 second) makes data visible faster but increases indexing overhead. For search-heavy workloads, consider increasing it slightly (e.g., 5-10 seconds) to reduce refresh pressure.
The practical workflow is simple: find the real slow query, profile it, reduce the amount of data it touches, and make the mapping match the way users search. If the query is already clean, look at shard layout, heap pressure, filesystem cache, and disk I/O. Elasticsearch is fast when the index design, query shape, and cluster resources agree with each other.