Top 5 Redis Performance Bottlenecks and How to Fix Them

Unlock peak performance from your Redis deployments with this essential guide to common bottlenecks. Learn to identify and resolve issues like slow O(N) commands, excessive network round trips, memory pressure and inefficient eviction policies, persistence overheads, and CPU-bound operations. This article provides actionable steps, practical examples, and best practices, from leveraging pipelining and `SCAN` to optimizing data structures and persistence, ensuring your Redis instance remains fast and reliable for all your caching, messaging, and data storage needs.

Top 5 Redis Performance Bottlenecks and How to Fix Them

Redis performance problems usually look mysterious until you remember one thing: Redis is fast, but it is not exempt from work. A command that walks a million keys still walks a million keys. A client that sends one command per network round trip still pays for every round trip. A server that runs out of memory still has to evict, swap, reject writes, or fall over depending on configuration.

When Redis slows down, do not start by changing random settings. Start with evidence:

redis-cli INFO
redis-cli SLOWLOG GET 20
redis-cli LATENCY DOCTOR
redis-cli INFO commandstats
redis-cli INFO memory

Those commands usually point toward one of five bottlenecks: slow commands, network round trips, memory pressure, persistence overhead, or CPU saturation.

1. Slow commands on large data

Redis has many tiny constant-time operations, but not every command is tiny. Commands such as KEYS, large LRANGE, SMEMBERS, HGETALL, ZRANGE over huge ranges, SORT, and long Lua scripts can block other clients while they run.

The classic incident starts with a cleanup or debugging command:

KEYS *

On a small development instance, it returns immediately. On a production keyspace with millions of keys, it can stall the server long enough for application requests to pile up. The same pattern happens with a hash that began as "a few fields per user" and quietly became a giant object.

Find the evidence:

redis-cli SLOWLOG GET 20
redis-cli INFO commandstats
redis-cli LATENCY LATEST

SLOWLOG records commands that exceeded the configured threshold. INFO commandstats shows per-command call counts and cumulative time. If one command dominates time, start there.

Fix the access pattern:

redis-cli --scan --pattern 'user:*'

Use SCAN instead of KEYS for keyspace iteration. Use HSCAN, SSCAN, and ZSCAN for large hashes, sets, and sorted sets. Fetch pages or ranges instead of whole structures:

LRANGE feed:user:42 0 49
ZRANGE leaderboard 0 99 WITHSCORES

If an object has grown too large, split it around how the application reads it. A single user:42 hash with thousands of unrelated fields may be convenient for writes but painful for reads that only need profile settings. Separate keys such as user:42:profile, user:42:prefs, and user:42:counters can reduce the amount of data touched per request.

For deletion, prefer UNLINK when values may be large:

UNLINK old:large:set

UNLINK removes the key from the keyspace and frees memory asynchronously. It is safer than DEL for large values, though bulk cleanup still needs throttling.

2. Too many network round trips

Redis may process a command in microseconds while your application spends milliseconds waiting on the network. If a request path sends 50 sequential Redis commands, the network can dominate the total time even when Redis itself is healthy.

This is common in code like:

for user_id in user_ids:
    profile = redis.get(f"user:{user_id}:profile")

Each GET waits for its own response before the next starts. Across a network, that is expensive.

Use pipelining:

pipe = redis.pipeline(transaction=False)
for user_id in user_ids:
    pipe.get(f"user:{user_id}:profile")
profiles = pipe.execute()

Pipelining sends multiple commands without waiting for each reply individually. Redis still executes commands in order, but the client avoids paying a round trip for every command.

Use multi-key commands where they fit:

MGET user:1:profile user:2:profile user:3:profile

Do not turn every request into one enormous pipeline. Large pipelines can increase memory use and create response bursts. Batch enough to remove obvious round-trip waste, then measure.

For read-heavy paths, also check whether your application repeatedly asks Redis for values that could be fetched once and reused for the duration of a request. A small local request cache can remove accidental duplicate reads without changing Redis at all.

3. Memory pressure and eviction churn

Redis is memory-centered. Once memory is tight, performance gets worse in several ways: eviction costs CPU, writes may fail under noeviction, persistence forks may become harder, replicas may lag, and the operating system may swap if the host is misconfigured or overloaded.

Check memory:

redis-cli INFO memory
redis-cli INFO stats | grep evicted_keys
redis-cli CONFIG GET maxmemory
redis-cli CONFIG GET maxmemory-policy

Important signs:

  • used_memory is close to maxmemory.
  • evicted_keys is increasing quickly.
  • The host is swapping.
  • Big keys consume more memory than expected.
  • Expiring cache keys do not actually have TTLs.

Find large keys carefully. Do not run broad expensive commands during peak traffic. Sampling with --bigkeys can help:

redis-cli --bigkeys

For caches, set a memory limit and an eviction policy that matches the data:

maxmemory 4gb
maxmemory-policy allkeys-lru

allkeys-lru or allkeys-lfu can make sense when all keys are cache entries. volatile-lru only evicts keys with TTLs, which is useful when persistent keys share the instance with cache keys. noeviction is often right for Redis used as a primary data store, because silently evicting durable-looking data would be worse than returning an error.

Set TTLs at write time:

SET cache:product:123 "$json" EX 300

For session stores, be deliberate. A session key without an expiration is usually a bug. For rate limiters, counters should expire with the window. For Streams, trim old entries. Memory leaks in Redis are often application data that never received a lifecycle.

Also reduce key and value overhead where it matters. Thousands of tiny keys can cost more metadata than expected. Sometimes a compact hash is better than many individual keys; sometimes the opposite is true because reads only need one field. Measure with real access patterns instead of assuming one shape is always best.

4. Persistence and disk I/O stalls

Persistence protects data, but it introduces disk and fork behavior you need to understand. RDB snapshots and AOF rewrites are normally background operations, but they can still cause latency through fork time, copy-on-write memory pressure, and disk I/O.

Check persistence state:

redis-cli INFO persistence
redis-cli LATENCY LATEST
iostat -xz 1

Look for failed background saves, long fork times, AOF rewrite activity, and disk saturation. If latency spikes line up with BGSAVE or BGREWRITEAOF, persistence tuning belongs on the shortlist.

For AOF, the main durability/performance setting is:

appendfsync everysec

everysec is the usual balanced choice. always syncs every write and can be very slow. no leaves syncing to the operating system and accepts more data-loss risk on crash.

For RDB, avoid snapshot rules that fire constantly on a busy write workload unless that is intentional:

save 900 1
save 300 10
save 60 10000

Those example defaults are not automatically right for every workload. A high-write Redis used as a disposable cache may not need persistence at all. A Redis instance used as a job queue or session store probably does, but the acceptable loss window must be clear.

If persistence competes with application traffic, consider:

  • Faster local SSD storage.
  • Separating Redis persistence from other disk-heavy services.
  • Running persistence on a replica when the primary can tolerate that design.
  • Keeping dataset size below what the host can fork comfortably.
  • Setting Linux vm.overcommit_memory=1 where Redis recommends it for background saves.

Do not disable persistence blindly to "fix performance" unless the data is truly disposable. It may make the graph look better while turning a restart into data loss.

5. CPU saturation and single-threaded command execution

Redis command execution is largely single-threaded, even though modern Redis uses additional threads for some I/O and background work. If one core is pegged by Redis, adding more idle cores on the same instance may not help the hot command path.

Check the host and Redis command mix:

top -H -p $(pgrep redis-server)
redis-cli INFO commandstats
redis-cli SLOWLOG GET 20
redis-cli INFO clients

Common CPU causes:

  • Large set, sorted set, list, or hash operations.
  • Heavy Lua scripts.
  • Compression or serialization overhead in the application causing larger values than expected.
  • Very high Pub/Sub fan-out.
  • Expensive eviction under memory pressure.
  • Too many connections constantly reconnecting or issuing small commands.

Fix CPU by reducing work, splitting work, or distributing work.

Reduce work by changing commands and data shapes. If you only need 50 items, do not fetch 5,000. If every request parses a 500 KB JSON blob to read one flag, split that flag into a smaller key or field.

Split work by moving long loops to clients using incremental scans:

HSCAN big:hash 0 COUNT 100

Distribute work with replicas for reads or Redis Cluster for sharding. Replicas help read-heavy traffic, but they do not make writes cheaper on the primary. Redis Cluster distributes keys across primaries, which can increase total CPU and memory capacity, but it also adds operational complexity and key-slot constraints.

For Pub/Sub, watch output buffers and fan-out:

redis-cli PUBSUB NUMSUB events:updates
redis-cli CLIENT LIST

A slow subscriber can turn into memory pressure. Thousands of subscribers can turn one publish into a large amount of network output. If Pub/Sub is heavy, consider isolating it on a separate Redis instance.

A fast triage workflow

When Redis latency rises, run these checks in order:

redis-cli --latency
redis-cli SLOWLOG GET 10
redis-cli LATENCY DOCTOR
redis-cli INFO memory
redis-cli INFO clients
redis-cli INFO persistence
redis-cli INFO commandstats

Then ask:

  • Did a slow command appear?
  • Did memory hit maxmemory or start evicting?
  • Did persistence start a save or rewrite?
  • Did connected clients or blocked clients jump?
  • Did one command's call count or time explode?
  • Did application deploys change Redis access patterns?

Most Redis bottlenecks are not solved by one magic setting. They are solved by making the workload smaller, more incremental, more batched, or better isolated. The best Redis deployments are boring: keys expire when they should, big operations are paged, clients pipeline wisely, persistence settings match the data's value, and monitoring catches the trend before users feel it.

What to measure before and after a fix

A performance fix is only real if the graph changes for the workload that matters. Before changing code or config, capture a small baseline:

redis-cli INFO stats
redis-cli INFO commandstats
redis-cli INFO memory
redis-cli INFO persistence
redis-cli SLOWLOG GET 20

At the system level, capture CPU, disk, memory, swap, and network throughput. If Redis runs in a container, check both container limits and host pressure. A Redis process can look fine inside its own memory view while the host is under disk or CPU pressure from another service.

After the change, compare:

  • p50, p95, and p99 application latency for Redis-backed requests.
  • Redis command latency, not just request latency.
  • Slowlog entries by command.
  • Eviction rate and memory headroom.
  • Connected clients and rejected connections.
  • Persistence fork time and AOF/RDB status.
  • Replica lag if replicas serve reads or protect durability.

Be suspicious of fixes that only move the pain. For example, a large pipeline may reduce request latency but increase memory spikes. Disabling AOF may remove disk latency but weaken recovery. Increasing maxmemory may delay evictions but starve the host if the machine was already shared.

One useful practice is to write a small load test around the exact Redis pattern you changed. If the old code did 40 sequential GETs, test sequential GETs versus MGET or pipelining with realistic payload sizes. If the old code used HGETALL, test HGET for the fields actually needed by the request. Redis tuning is much easier when you benchmark the shape you actually run, not a generic "Redis ops per second" number.

Finally, keep the rollback simple. A Redis performance change often sits in application code, client settings, and server config at the same time. Change one thing when you can. If you must change several things, write down which symptom each change is supposed to improve. That prevents the next engineer from inheriting a pile of mysterious settings that nobody wants to remove.