Troubleshooting Common Redis Connection Errors Effectively

Redis connection errors are usually plain once you separate them into three questions: can the client reach the host and port, does Redis accept the connection, and is the client allowed to run commands after connecting?

Do that in order. Jumping straight into application code wastes time when Redis is stopped. Rebuilding a firewall rule wastes time when the password is wrong. A small, repeatable checklist gets you to the real failure faster.

First, test from the same place as the application

Testing from your laptop is useful, but it does not prove that a Kubernetes pod, VM, container, or CI runner can reach Redis. Start inside the same network location as the failing application.

redis-cli -h redis.example.internal -p 6379 PING

Expected output:

PONG

If Redis requires TLS, use the TLS options your deployment expects:

redis-cli --tls -h redis.example.internal -p 6380 PING

If Redis requires authentication:

redis-cli -u redis://app-user:[email protected]:6379 PING

Be careful with passwords in shell history. For production, use temporary credentials or environment variables when possible.

Connection refused

ECONNREFUSED, Connection refused, or Could not connect to Redis usually means the TCP connection reached the target host, but nothing accepted it on that port. The most common causes are simple:

Redis is not running.
The client is using the wrong host or port.
Redis is bound only to localhost.
A container or service mapping points to the wrong port.
A firewall actively rejects the connection.

On the Redis host, check the process and listener:

redis-cli PING
ps aux | grep '[r]edis-server'
ss -ltnp | grep redis

You want to see Redis listening on the expected address and port, commonly 127.0.0.1:6379, 0.0.0.0:6379, or a private interface address.

Check redis.conf or the effective config:

redis-cli CONFIG GET bind
redis-cli CONFIG GET port
redis-cli CONFIG GET protected-mode

If bind is 127.0.0.1, remote clients cannot connect directly. That is often intentional. Do not change it to 0.0.0.0 as a quick fix unless Redis is protected by authentication, ACLs, firewall rules, and private networking. Redis exposed on the public internet is a serious security incident waiting to happen.

In Docker, remember the difference between container port and host port:

docker ps
docker port <redis-container>

Inside a Docker Compose network, applications usually connect to the service name and internal port:

redis://redis:6379

From the host, they may connect to a published port such as localhost:6379 or localhost:6381, depending on the mapping.

Connection timeout

A timeout means the client waited and did not complete the operation in time. Unlike refused connections, timeouts often point to a path problem or a busy server.

Check the TCP path:

nc -vz redis.example.internal 6379
ping -c 5 redis.example.internal

ping is not perfect because ICMP may be blocked while TCP works, but it can reveal obvious DNS or routing mistakes. nc is closer to what the Redis client needs.

If TCP connects but Redis commands time out, check whether Redis is busy:

redis-cli INFO clients
redis-cli INFO stats
redis-cli INFO memory
redis-cli SLOWLOG GET 10
redis-cli LATENCY DOCTOR

Look for blocked clients, high connected client counts, memory near maxmemory, swap on the host, slow commands, and latency events. A single expensive command such as KEYS *, a large HGETALL, or a long Lua script can delay unrelated clients because Redis command execution is largely single-threaded.

Also check the client timeout settings. Some libraries use short defaults for connect timeouts or command timeouts. Raising the timeout may reduce false failures across a slow network, but it should not hide a Redis instance that is overloaded. If a simple PING takes seconds from the application host, fix that before tuning retries.

Name resolution and wrong endpoint problems

Not every connection error is Redis. DNS and service discovery cause plenty of them.

From the application host:

getent hosts redis.example.internal
nslookup redis.example.internal

In Kubernetes:

kubectl exec -it deploy/my-app -- sh
getent hosts redis.default.svc.cluster.local
nc -vz redis.default.svc.cluster.local 6379

Check whether the application is using a read replica endpoint, a sentinel endpoint, a cluster endpoint, or a direct node endpoint. Redis Cluster clients need cluster-aware libraries because keys may belong to different slots and commands can receive redirects. A non-cluster-aware client may connect successfully and then fail with MOVED or ASK errors once it sends real commands.

Authentication errors

Authentication failures show up as:

NOAUTH Authentication required
WRONGPASS invalid username-password pair
NOPERM this user has no permissions
Client-library-specific authentication exceptions

For Redis 6 and newer, ACL users are common. A connection string may need both username and password:

redis://app-user:[email protected]:6379/0

With the default user, some clients use only a password:

redis://:[email protected]:6379/0

Check the active user configuration if you have admin access:

redis-cli ACL LIST
redis-cli ACL GETUSER app-user

NOAUTH means the client did not authenticate before issuing a command. WRONGPASS means authentication was attempted but rejected. NOPERM means authentication worked, but the user lacks permission for the command, key pattern, or Pub/Sub channel.

When secrets rotate, confirm that every running process actually received the new value. In container platforms, updating a secret object does not always restart existing pods or processes. A common real-world failure is half the application using the new password and half still using the old one.

TLS mismatch

TLS mistakes can look like connection resets, timeouts, or unreadable protocol errors.

Check the port. Managed services often use different ports for TLS and non-TLS Redis. For example, one endpoint may expect plain Redis protocol and another may expect TLS from the first byte.

Test with:

redis-cli --tls -h redis.example.internal -p 6380 PING
redis-cli -h redis.example.internal -p 6379 PING

If your organization uses private certificates, the client may also need a CA file:

redis-cli --tls --cacert /path/to/ca.pem -h redis.example.internal -p 6380 PING

In application logs, certificate errors are often clearer than the top-level Redis exception. Look for messages about unknown authorities, expired certificates, host name mismatch, or handshake failure.

Too many connections

Redis has a maxclients limit. The operating system also has file descriptor limits. When either is exhausted, new clients may fail or existing clients may behave poorly.

Check:

redis-cli INFO clients
redis-cli CONFIG GET maxclients
ulimit -n

Useful fields include connected_clients, blocked_clients, and rejected_connections from INFO stats.

Too many connections usually comes from one of these patterns:

Creating a new Redis client per web request.
Not closing clients in short-lived jobs.
Too many worker processes, each with its own large pool.
Pub/Sub subscriptions borrowing connections from a normal command pool.
Retry storms during a Redis restart.

Fix the application shape before raising limits. Use one shared client or a bounded pool per process. Add jittered reconnect backoff so every instance does not reconnect at the same millisecond after an outage.

Protected mode and bind settings

Redis protected mode is designed to reduce the damage from accidental exposure. If Redis is bound broadly and has no authentication, protected mode may reject remote connections.

Check:

redis-cli CONFIG GET protected-mode
redis-cli CONFIG GET bind
redis-cli CONFIG GET requirepass

Do not disable protected mode just to make a remote connection work. The safer path is usually private networking plus authentication and a narrow bind address. If Redis must accept remote clients, put it on a private subnet, restrict source IPs, require credentials, and use TLS where appropriate.

A practical order of operations

When an application cannot connect, use this sequence:

From the application environment, run redis-cli PING against the same host and port.
If refused, check Redis process, listener, bind, port, and container mapping.
If timed out, check routing, firewall rules, server load, slow commands, and client timeout settings.
If authentication fails, verify username, password, ACL permissions, and secret rollout.
If only some commands fail, check ACL command/key permissions and Redis Cluster redirects.
If failures happen under load, check connection counts, pool sizing, retries, and server resource metrics.

Connection troubleshooting is mostly evidence collection. Get a clean CLI result from the same place as the app, then compare it with what the client library is doing. Once those two paths differ, the gap is usually visible: a missing TLS flag, an old password, a wrong service name, or a pool that creates far more connections than Redis was sized to handle.

Reading application errors without overreacting

Client libraries wrap Redis errors in their own language. A Node.js service may show ECONNRESET, a Python worker may show redis.exceptions.ConnectionError, and a Java service may report a pool acquisition timeout. Those can all describe different layers of the same problem.

Separate them:

Connect timeout: the TCP connection did not complete quickly enough.
Read timeout: the connection exists, but a command response did not arrive in time.
Connection reset: the connection was closed by Redis, a proxy, the network, or the peer.
Pool timeout: the application could not borrow a Redis connection from its own pool.
Authentication error: Redis rejected the credentials or permissions.

A pool timeout is easy to misread as a Redis outage. Sometimes Redis is fine, but the application borrowed every pool connection and never returned them. Pub/Sub misuse can cause this. So can long blocking commands, request handlers that forget to close clients, or a pool that is too small for the concurrency of the process.

Check both sides at the same time. In the application, inspect pool metrics if the library exposes them: active connections, idle connections, waiters, retry count. In Redis, check:

redis-cli INFO clients
redis-cli CLIENT LIST | head

If Redis shows only a handful of clients but the application says its pool is exhausted, the issue is probably inside the application process. If Redis shows thousands of connections from the same deployment, the service may be creating clients too often.

Retries deserve special attention. A reconnect loop without backoff can turn a short Redis restart into a storm. Every application instance tries to reconnect immediately, authentication and TLS handshakes spike, and Redis has to recover while being hammered by clients. Use exponential backoff with jitter. Also decide which commands are safe to retry. Retrying an idempotent cache GET is different from retrying a write that may already have succeeded before the connection dropped.

For incident notes, capture exact error text and timing. "Redis was down" is often wrong. "From 14:03 to 14:06 UTC, app pods saw read timeouts while Redis CPU was at one core and SLOWLOG showed large HGETALL calls" is actionable.