Optimizing Slow Elasticsearch Queries: Best Practices for Performance Tuning

Diagnose and improve slow Elasticsearch queries with better query shape, pagination, caching, mappings, and the Profile API.

Optimizing Slow Elasticsearch Queries: Best Practices for Performance Tuning

Slow Elasticsearch queries usually come from one of four places: the query asks for too much, the mapping makes the query expensive, the cluster is short on resources, or the application repeats costly searches that should be cached or redesigned. The fix depends on which one is true.

Before rewriting everything, capture a real slow request with its index, filters, sort, aggregations, page depth, response size, and timing. A dashboard aggregation, an autocomplete query, and an export job all stress Elasticsearch differently.

Understanding Query Performance Bottlenecks

Before diving into solutions, it's helpful to understand common reasons behind slow Elasticsearch queries. These often include:

  • Complex Queries: Queries with multiple bool clauses, nested queries, or expensive operations like wildcard or regexp on large datasets.
  • Inefficient Data Retrieval: Fetching _source unnecessarily, or retrieving large numbers of documents for pagination.
  • Resource Constraints: Insufficient CPU, memory, or disk I/O on data nodes.
  • Suboptimal Mappings: Using incorrect data types or not leveraging doc_values for aggregations.
  • Shard Imbalance or Overload: Too many shards, too few shards, or uneven distribution of shards/data.
  • Cache Misses or Bad Cache Fit: Repeating expensive searches without using request caching, filter context, or application-level caching where appropriate.

Optimizing Query Structure

The way you construct your queries has a profound impact on their performance. Small changes can lead to significant improvements.

1. Retrieve Only Necessary Fields (_source Filtering & stored_fields)

By default, Elasticsearch returns the entire _source field for each matching document. If your documents are large and the UI only needs a title, ID, and timestamp, fetching the whole document wastes network bandwidth and parsing time.

  • _source Filtering: Use the _source parameter to specify an array of fields to include or exclude.

    GET /my-index/_search
    {
      "_source": ["title", "author", "publish_date"],
      "query": {
        "match": {
          "content": "Elasticsearch performance"
        }
      }
    }
    
  • stored_fields: If you've explicitly stored specific fields in your mapping ("store": true), you can retrieve them with stored_fields. Most deployments do not store many fields this way, so _source filtering is the more common fix.

    GET /my-index/_search
    {
      "stored_fields": ["title", "author"],
      "query": {
        "match": {
          "content": "Elasticsearch performance"
        }
      }
    }
    

2. Prefer Efficient Query Types

Some query types are inherently more resource-intensive than others.

  • Avoid Leading Wildcards and Broad Regexps: wildcard and regexp queries can be expensive, especially with leading wildcards such as *test. Prefix queries are usually more manageable than leading wildcard searches, but they still need sensible mappings and bounded input.

    # Inefficient - avoid leading wildcard
    {
      "query": {
        "wildcard": {
          "name.keyword": {
            "value": "*search"
          }
        }
      }
    }
    
    # Better - if you know the prefix
    {
      "query": {
        "prefix": {
          "name.keyword": {
            "value": "Elastic"
          }
        }
      }
    }
    
  • Use match_phrase for phrase intent: If the user is searching for an exact phrase, match_phrase expresses that intent better than several unrelated match clauses. It is not always cheaper, but it avoids returning documents that only contain the words far apart.

  • Filter context for yes/no conditions: When you only care whether a document matches a condition, put that condition in filter context or use constant_score. This avoids unnecessary scoring work and is more cache-friendly.

    GET /my-index/_search
    {
      "query": {
        "constant_score": {
          "filter": {
            "term": {
              "status": "active"
            }
          }
        }
      }
    }
    

3. Optimize Boolean Queries

  • Use filters for structured constraints: Put tenant IDs, status values, date ranges, and exact tags in filter, not must, unless they need scoring. Elasticsearch can reorder and optimize clauses internally, so do not rely on JSON order as your main performance tool.
  • Use minimum_should_match intentionally: It can improve relevance and reduce broad matches, but setting it too high can hide valid results.

4. Efficient Pagination (search_after and scroll)

Traditional from/size pagination becomes very inefficient for deep pages (e.g., from: 10000, size: 10). Elasticsearch has to retrieve and sort all documents up to from + size on each shard, then discard from documents.

  • search_after: For real-time deep pagination, search_after is recommended. It uses the sort order of the previous page's last document to find the next set of results, similar to cursors in traditional databases. It's stateless and scales better.

    # First request
    GET /my-index/_search
    {
      "size": 10,
      "query": {"match_all": {}},
      "sort": [{"timestamp": "asc"}, {"_id": "asc"}]
    }
    
    # Subsequent request using the sort values of the last document from the first request
    GET /my-index/_search
    {
      "size": 10,
      "query": {"match_all": {}},
      "search_after": [1678886400000, "doc_id_XYZ"],
      "sort": [{"timestamp": "asc"}, {"_id": "asc"}]
    }
    
  • scroll API: For bulk retrieval of large datasets, such as reindexing or exports, scroll can still be useful. For newer Elasticsearch versions and long-running full-index scans, also consider point-in-time plus search_after. Scroll is not suitable for user-facing real-time pagination.

5. Optimizing Aggregations

Aggregations can be resource-intensive, especially on high-cardinality fields.

  • Pre-computing Aggregations: Consider running complex, non-real-time aggregations during indexing or on a schedule to pre-compute results and store them in a separate index.
  • doc_values: Ensure fields used in aggregations have doc_values enabled (which is the default for most non-text fields). This allows Elasticsearch to load data for aggregations efficiently without loading _source.
  • eager_global_ordinals: For keyword fields frequently used in terms aggregations, setting eager_global_ordinals: true in the mapping can improve performance by pre-building global ordinals. This incurs a cost at index refresh time but speeds up query time aggregations.

Leveraging Caching Techniques

Elasticsearch offers several layers of caching that can significantly speed up repeated queries.

1. Node Query Cache

  • Mechanism: Caches the results of filter clauses within bool queries that are used frequently. It's an in-memory cache at the node level.
  • Effectiveness: Most effective for repeated filter clauses. Do not count on it for every query; Elasticsearch decides what is worth caching.
  • Configuration: Enabled by default. You can control its size with indices.queries.cache.size (default 10% of heap).

2. Shard Request Cache

  • Mechanism: Caches shard-level search results, most commonly for aggregation-heavy requests with size=0. It is a strong fit for repeated dashboard queries over data that is not changing every second.

  • Effectiveness: Excellent for dashboard queries or analytical applications where the same request (including aggregations) is executed repeatedly with identical parameters.

  • How to use: Enable it explicitly in your query using "request_cache": true.

    GET /my-index/_search?request_cache=true
    {
      "size": 0,
      "query": {
        "bool": {
          "filter": [
            {"term": {"status.keyword": "active"}},
            {"range": {"timestamp": {"gte": "now-1h"}}}
          ]
        }
      },
      "aggs": {
        "messages_per_minute": {
          "date_histogram": {
            "field": "timestamp",
            "fixed_interval": "1m"
          }
        }
      }
    }
    
  • Caveats: The cache is invalidated whenever a shard is refreshed (new documents are indexed or existing ones updated). Only useful for queries that return identical results frequently.

3. Filesystem Cache (OS-level)

  • Mechanism: The operating system's filesystem cache plays a critical role. Elasticsearch relies heavily on it to cache frequently accessed index segments.
  • Effectiveness: Crucial for query performance. If index segments are in RAM, disk I/O is bypassed entirely, leading to much faster query execution.
  • Best Practice: Leave substantial RAM for the filesystem cache. A common starting point is to keep JVM heap around half of system memory, with the usual Elasticsearch heap limits in mind, then validate with your workload.

4. Application-Level Caching

  • Mechanism: Implementing a cache at your application layer (e.g., using Redis, Memcached, or an in-memory cache) for frequently requested search results.
  • Effectiveness: Can provide the fastest response times by completely bypassing Elasticsearch for repeat requests. Best for static or slowly changing search results.
  • Considerations: Cache invalidation strategy is key. Requires careful design to ensure data consistency.

Using the Profile API for Bottleneck Identification

The Profile API is an invaluable tool for understanding exactly how Elasticsearch executes a query and where time is spent. It breaks down the execution time for each component of your query and aggregation.

How to Use the Profile API

Simply add "profile": true to your search request body.

GET /my-index/_search
{
  "profile": true,
  "query": {
    "bool": {
      "must": [
        {"match": {"title": "Elasticsearch"}},
        {"term": {"status.keyword": "published"}}
      ],
      "filter": [
        {"range": {"publish_date": {"gte": "2023-01-01"}}}
      ]
    }
  },
  "aggs": {
    "top_authors": {
      "terms": {
        "field": "author.keyword",
        "size": 10
      }
    }
  }
}

Interpreting Profile API Results

The response will include a profile section detailing query and aggregation execution on each shard. Key metrics to look for include:

  • description: The specific query or aggregation component.
  • time_in_nanos: The time spent executing this component.
  • breakdown: Detailed sub-metrics like build_scorer_time, collect_time, set_weight_time for queries, and reduce_time for aggregations.
  • children: Nested components, showing how time is distributed within complex queries.

Example Interpretation:

If you see a high time_in_nanos for a WildcardQuery, it confirms that this is an expensive part of your query. If collect_time is high, it suggests that retrieving and processing documents after a match is a bottleneck, possibly due to _source parsing or deep pagination. High reduce_time in aggregations might indicate a heavy load during the final merge phase.

By examining these metrics, you can pinpoint specific query clauses or aggregation fields that are consuming the most resources and then apply the optimization techniques discussed earlier.

General Best Practices for Performance

Beyond query-specific optimizations, several cluster-wide and index-level best practices contribute to overall search performance.

1. Optimal Index Mappings

  • text vs. keyword: Use text for full-text search and keyword for exact-value matching, sorting, and aggregations. Mismatched types can lead to inefficient queries.
  • doc_values: Ensure doc_values are enabled for fields you intend to sort or aggregate on. They are enabled by default for most field types that support sorting and aggregations, such as keyword, numeric, date, boolean, and IP fields. Plain text fields are for full-text search; use a keyword subfield when you need exact matching or aggregation.
  • norms: Disable norms ("norms": false) for fields where you don't need document length normalization (e.g., ID fields). This saves disk space and improves indexing speed, with minimal impact on query performance for non-scoring queries.
  • index_options: For text fields, use index_options: docs if you only need to know if a term exists in a document, and index_options: positions (the default) if you need phrase queries and proximity searches.

2. Monitor Cluster Health and Resources

  • Cluster Status: Green is the target. Yellow means one or more replica shards are unassigned; searches can still work, but resilience is reduced and performance may suffer. Red means primary shards are missing and some data is unavailable.
  • Resource Monitoring: Regularly monitor CPU, RAM, disk I/O, and network usage on your data nodes. Spikes in these metrics often correlate with slow queries.
  • JVM Heap: Keep an eye on JVM heap usage. High utilization can lead to frequent garbage collection pauses, making queries slow. Optimize queries to reduce heap pressure.

3. Proper Shard Allocation

  • Too Many Shards: Each shard consumes resources. Many small shards create overhead. Shards in the tens of gigabytes are common, but the right size depends on heap, query pattern, recovery targets, and hardware.
  • Too Few Shards: Limits parallelism. Queries against an index with too few shards won't be able to leverage all available data nodes efficiently.

4. Indexing Strategy

  • Refresh Interval: A lower refresh_interval (default 1 second) makes data visible faster but increases indexing overhead. For search-heavy workloads, consider increasing it slightly (e.g., 5-10 seconds) to reduce refresh pressure.

The practical workflow is simple: find the real slow query, profile it, reduce the amount of data it touches, and make the mapping match the way users search. If the query is already clean, look at shard layout, heap pressure, filesystem cache, and disk I/O. Elasticsearch is fast when the index design, query shape, and cluster resources agree with each other.