Troubleshooting Slow Redis Commands: A Performance Checklist

A practical checklist for finding slow Redis commands with SLOWLOG, MONITOR, latency tools, command complexity, and safer fixes.

Troubleshooting Slow Redis Commands: A Performance Checklist

Slow Redis commands usually start as normal commands that outgrow their assumptions. A SMEMBERS call was harmless when the set had 200 members. A dashboard query was fine when it loaded 50 keys. A Lua script was quick until one customer created a much larger data shape than everyone else.

The useful question is not just "which command is slow?" It is "what data shape made this command slow, and why is the application asking Redis to do that much work in one turn?"

Understanding Redis Performance

Redis's performance is generally exceptional due to its in-memory nature. However, several factors can contribute to command latency:

  • Command Complexity: Certain commands are inherently more resource-intensive than others (e.g., KEYS on a large dataset vs. GET).
  • Data Size and Structure: Large lists, sets, or sorted sets, or complex data structures, can impact the performance of commands that operate on them.
  • Network Latency: While not directly a command issue, high network latency between the client and server can make commands appear slow.
  • Server Load: High CPU usage, insufficient memory, or other processes on the Redis server can degrade performance.
  • Blocking Commands: Certain operations can block the Redis event loop, affecting all subsequent commands.

Identifying Slow Commands with SLOWLOG

The SLOWLOG command is Redis's built-in mechanism for logging commands that exceed a specified execution time. This is your primary tool for proactively identifying problematic commands.

How SLOWLOG Works

Redis maintains a circular buffer that stores information about commands that took longer than the configured slowlog-log-slower-than threshold (in microseconds). The default threshold is typically 10 milliseconds (10000 microseconds). When this buffer fills up, older entries are discarded.

Key SLOWLOG Subcommands

  • SLOWLOG GET [count]: Retrieves the last count entries from the slow log. If count is omitted, it retrieves all entries.
  • SLOWLOG LEN: Returns the current length of the slow log (number of entries).
  • SLOWLOG RESET: Clears the slow log entries. Use this command with caution, as it permanently removes the logged data.

Example Usage of SLOWLOG

Let's assume you suspect some commands are taking too long. You can check the slow log as follows:

# Connect to your Redis instance
redis-cli

# Get the last 5 slow commands
127.0.0.1:6379> SLOWLOG GET 5

The output will look something like this:

1) 1) (integer) 18
   2) (integer) 1678886400
   3) (integer) 15000
   4) 1) "KEYS"
      2) "*"

2) 1) (integer) 17
   2) (integer) 1678886390
   3) (integer) 12000
   4) 1) "SMEMBERS"
      2) "my_large_set"

...

Explanation of the output:

  1. Entry ID: A unique identifier for the slow log entry.
  2. Timestamp: The Unix timestamp when the command was executed.
  3. Execution Time: The duration (in microseconds) the command took to execute.
  4. Command and Arguments: The command itself and its arguments.

In the example above, KEYS * took 15000 microseconds (15ms) and SMEMBERS my_large_set took 12000 microseconds (12ms). These would be considered slow if your slowlog-log-slower-than is set to 10000 microseconds.

Configuring slowlog-log-slower-than

You can dynamically change the slowlog-log-slower-than threshold using the CONFIG SET command:

127.0.0.1:6379> CONFIG SET slowlog-log-slower-than 50000  # Log commands slower than 50ms

To make this change persistent across Redis restarts, you would need to modify the redis.conf file and restart the Redis server, or use CONFIG REWRITE to save the changes to the configuration file.

Real-time Command Monitoring with MONITOR

While SLOWLOG provides a historical view, MONITOR offers a real-time stream of all commands being executed by the Redis server. This is invaluable for debugging during a specific period of slow performance or for understanding command traffic patterns.

How MONITOR Works

When you enable MONITOR, Redis sends a response to the MONITOR client for every command it receives and processes. This can generate a very high volume of output, especially on busy Redis instances. Therefore, it's generally recommended to use MONITOR sparingly and only when actively debugging.

Example Usage of MONITOR

From a separate redis-cli session, execute the MONITOR command:

# Connect to your Redis instance in a *separate* terminal
redis-cli

# Start monitoring
127.0.0.1:6379> MONITOR

Now, any command executed in another redis-cli session or by your application will appear in the MONITOR output. For example, if you run SET mykey myvalue in another client, you'll see:

1678887000.123456 [0 127.0.0.1:54321] "SET" "mykey" "myvalue"

Using MONITOR for Debugging

  1. Reproduce the Issue: When you notice a slowdown, immediately start MONITOR in a dedicated redis-cli session.
  2. Trigger the Slow Operation: Have your application perform the action that you suspect is causing the slowdown.
  3. Analyze the Output: Observe the commands in the MONITOR stream. Look for:
    • Commands that take a long time to appear (though MONITOR itself doesn't show execution time, you can infer it by timing the commands manually or observing delays).
    • Unusual or unexpected commands being executed.
    • A high volume of commands that might be overloading the server.
  4. Stop Monitoring: Press Ctrl+C to exit the MONITOR command.

Important: Do not run MONITOR in a production environment for extended periods, as it can significantly impact Redis performance due to the overhead of sending every command to the client.

Common Causes of Slow Commands and How to Fix Them

Based on the information gathered from SLOWLOG and MONITOR, here are common culprits and their solutions:

1. KEYS Command

  • Problem: The KEYS command iterates over the entire keyspace to find keys matching a pattern. On databases with millions of keys, this can take a very long time and block the Redis server, affecting all other clients.
  • Solution: Avoid KEYS on large production keyspaces. Use SCAN when you need incremental key iteration. SCAN returns a subset of keys matching a pattern in each call, which reduces the chance of blocking the server for a long time.
      # Instead of KEYS user:*
      redis-cli -h <host> -p <port> SCAN 0 MATCH user:* COUNT 100
    
    You'll need to call SCAN multiple times, using the cursor returned by the previous call, until the cursor returns to 0.

2. Complex Scripting (Lua Scripts)

  • Problem: Long-running or inefficient Lua scripts executed via EVAL or EVALSHA can block the server. While Redis executes scripts atomically, a single long script can monopolize the event loop.
  • Solution: Optimize your Lua scripts. Break down complex logic into smaller, manageable scripts. Analyze script performance. Ensure loops within scripts are efficient and terminate correctly. Benchmark your scripts to understand their execution time.

3. Operations on Large Data Structures

  • Problem: Commands like SMEMBERS on a set with millions of members, LRANGE on a very long list, or ZRANGE on a huge sorted set can be slow.
  • Solution: Avoid fetching entire large data structures. Instead, use iterative commands or process data in chunks:
    • Sets: Use SSCAN instead of SMEMBERS.
    • Lists: Use LRANGE with smaller start and stop values to retrieve data in pages.
    • Sorted Sets: Use ZRANGE with LIMIT or ZSCAN.

4. Commands Requiring Key Iteration (Less Common but Possible)

  • Problem: While less common, commands that might implicitly iterate over keys due to their nature could be slow if the keyspace is large.
  • Solution: Review the Redis command reference for the specific command and understand its complexity. Consider alternative data structures or approaches if a specific command proves to be a bottleneck.

5. Blocking Commands (Rare in Modern Redis)

  • Problem: Older Redis versions had some commands that could block the server. Most of these have been addressed or replaced.
  • Solution: Ensure you are using a recent version of Redis. Consult the Redis documentation for any known blocking operations specific to your version.

First Decide Whether Redis Is Slow or the Client Is Waiting

When someone says "Redis is slow," they may mean several different things. The server may be spending too long executing a command. The client may be waiting on the network. A connection pool may be exhausted. A TLS proxy may be overloaded. A large response may be taking longer to transfer than the command took to execute.

SLOWLOG only records command execution time inside Redis. It does not include network transfer time, client queuing time, or time spent waiting for a connection from an application pool. That is why a clean slow log does not always prove users are imagining latency.

Compare three views:

redis-cli --latency -h <host> -p <port>
redis-cli --latency-history -h <host> -p <port>
redis-cli SLOWLOG GET 10

If latency is high but SLOWLOG is empty, look at network, client pools, server CPU saturation, fork activity, persistence, or large replies. If SLOWLOG shows repeated expensive commands, start with command and data-structure design.

In applications, add timing around Redis calls at the client boundary. Log the command family, key pattern, elapsed time, and whether the client waited for a pool connection. Do not log secrets or full payloads. A small amount of structured timing usually answers whether the delay is inside Redis or before the command even reaches it.

Use Command Complexity as a Smell Test

Redis commands are fast when they touch a small, bounded amount of data. They become risky when they scan a large keyspace, return a huge collection, or do work proportional to a large value.

Before blaming hardware, check the command's complexity in the Redis command reference for your version. You do not need to memorize every complexity label, but the shape matters:

  • GET user:123 is bounded by the size of one value.
  • HGET profile:123 email is bounded by one hash lookup.
  • SMEMBERS followers:celebrity returns the whole set.
  • KEYS * scans the whole keyspace.
  • LRANGE queue 0 -1 returns the whole list.
  • ZREMRANGEBYSCORE may remove a large number of sorted-set members.

The risky pattern is usually "give me everything." It may work for months, then fail when a set grows from hundreds of members to millions. Redis did not suddenly become slow; the data crossed the point where an unbounded command became visible.

Safer Replacements for Common Slow Patterns

Replace whole-keyspace and whole-collection commands with incremental patterns.

For key discovery, use SCAN:

redis-cli --scan --pattern 'user:*'

For sets, use SSCAN:

SSCAN active_users 0 COUNT 500

For hashes, use HSCAN:

HSCAN user:123:settings 0 COUNT 200

For sorted sets, prefer ranges with explicit bounds and limits:

ZRANGE leaderboard 0 99 WITHSCORES
ZRANGEBYSCORE events 1716600000 1716686400 LIMIT 0 500

For lists, page with bounded ranges:

LRANGE recent_jobs 0 99

SCAN is incremental, but it is not a magic free operation. It may return duplicates, and it does not give a perfectly consistent snapshot while keys are changing. It is good for maintenance, migration, and background discovery. It is usually not the right primitive for a user-facing request path that needs a precise real-time list.

Large Replies Can Be the Real Cost

A command can execute quickly and still hurt your application if it returns too much data. SMEMBERS on a huge set, HGETALL on a large hash, or MGET over thousands of large values may spend time serializing the reply and sending it over the network. That cost may not show up clearly as command execution time alone.

Watch network output and client memory during the slow operation. If a single request returns tens or hundreds of megabytes, redesign the access pattern. Store summary data separately. Page the result. Use a sorted set index and fetch only the visible slice. Avoid placing large documents in Redis when the application usually needs one field.

A practical example: if a dashboard shows the latest 50 jobs, do not store every job ID in a list and call LRANGE jobs 0 -1 before slicing in the app. Store the list in newest-first order and request only what the page needs:

LRANGE jobs:recent 0 49

That small change can remove a surprising amount of latency and memory pressure.

MONITOR Is a Scalpel, Not a Dashboard

MONITOR is useful when you need to see exactly what commands a client sends, especially when you suspect the application is doing something different from what the code review suggests. But on a busy Redis server, MONITOR creates overhead and produces a flood of output.

Use it for a short, controlled window:

redis-cli MONITOR | head -n 200

Then stop it. In production, prefer sampling from application logs, Redis command stats, or a short maintenance window when possible.

INFO commandstats is often safer for a broad view:

redis-cli INFO commandstats

It shows per-command call counts and cumulative microseconds. It will not tell you which key was slow, but it can reveal that an application is issuing far more HGETALL, KEYS, or EVAL calls than expected.

Lua Scripts Need Boundaries

Lua scripts are powerful because they run atomically inside Redis. That same atomic behavior means a long script blocks other commands while it runs. Slow scripts often come from loops over large collections, unbounded key discovery, or logic that grew from a tiny helper into a mini application.

Review scripts with the same questions:

  • How many keys can this touch?
  • How many elements can this loop over?
  • What happens when the input key has one million members?
  • Can the work be split into smaller chunks?
  • Does the script return a large payload?

If a script appears in SLOWLOG, resist the temptation to only raise slowlog-log-slower-than. The log is telling you that one atomic block is taking long enough to affect other clients.

Persistence, Forks, and "Slow Commands" That Are Symptoms

Sometimes commands are slow because Redis is busy with background work. RDB snapshots and AOF rewrite operations can increase CPU, memory pressure, and disk I/O. On Linux, forking a large Redis process can also create latency spikes, especially when memory overcommit, huge pages, or slow storage are involved.

Check:

redis-cli INFO persistence
redis-cli INFO stats
redis-cli INFO memory
redis-cli LATENCY LATEST

If latency spikes line up with background saves or AOF rewrites, tune persistence carefully. You may need faster storage, adjusted save policies, AOF rewrite thresholds, or memory settings. Do not disable persistence just to make a benchmark look better unless Redis is purely a disposable cache and the business accepts losing the data.

Client Behavior Can Overload Redis Without One Bad Command

A Redis server can be hurt by millions of tiny inefficient calls just as much as by one obviously slow command. A page that makes 200 sequential GET calls will feel slow even if every individual GET is fast.

Use pipelining when the application needs many independent commands and can tolerate receiving replies together:

GET user:1
GET user:2
GET user:3

sent as a pipeline avoids a round trip per command. Pipelining is not a replacement for good data modeling, and it can increase memory use if batches are too large. Start with modest batch sizes and measure.

Also inspect connection pools. If application logs show Redis calls taking 500 ms but Redis sees no slow commands, the app may be waiting for a free connection. Increase the pool only after checking why existing connections are busy. A bigger pool can hide the symptom while increasing pressure on Redis.

A Practical Incident Checklist

When Redis latency is hurting users, collect facts in this order:

date -u
redis-cli PING
redis-cli --latency -i 1
redis-cli SLOWLOG GET 20
redis-cli INFO commandstats
redis-cli INFO clients
redis-cli INFO memory
redis-cli INFO persistence
redis-cli LATENCY LATEST

Then ask what changed: a deploy, a new endpoint, a data growth event, a batch job, a migration, a dashboard query, a new cache key pattern, or a persistence rewrite. Redis slowdowns are often tied to a single access pattern that became popular or a key that became much larger than expected.

For each slow command, write down the key pattern and owner. "Slow SMEMBERS" is not enough. "The recommendations service calls SMEMBERS product:123:viewers on a set that can grow without limit" is actionable.

Performance Tuning Checklist Summary

  1. Enable and Monitor SLOWLOG: Periodically review SLOWLOG GET to identify recurring slow commands. Adjust slowlog-log-slower-than if necessary.
  2. Use MONITOR Cautiously: For real-time debugging during suspected slowdowns, but disable it immediately afterward.
  3. Avoid KEYS on large production keyspaces: Use SCAN for incremental iteration when key discovery is genuinely needed.
  4. Optimize Lua Scripts: Ensure EVAL and EVALSHA scripts are efficient and don't run excessively long.
  5. Process Large Data Structures Iteratively: Use SSCAN, ZSCAN, LRANGE with limits, or SCAN instead of fetching entire collections.
  6. Analyze Command Arguments: Ensure the arguments passed to commands are not causing unexpected behavior (e.g., very large counts, complex patterns).
  7. Monitor Server Resources: Keep an eye on Redis server CPU, memory, and network usage. Slow commands can sometimes be a symptom of a strained server.
  8. Client-Side Optimizations: Verify that your application isn't sending commands too rapidly or in inefficient batches. Consider pipelining for multiple commands where appropriate.

Final Check

Use SLOWLOG to find commands that are slow inside Redis, latency tools to catch server-side spikes, and application timing to catch client waiting. Then fix the access pattern, not only the threshold. Bounded commands, smaller replies, sensible batching, and clear ownership of large keys do more for Redis performance than chasing one-off tuning changes.