Redis Pub/Sub Message Loss: Causes and Reliable Alternatives
Discover why Redis Pub/Sub drops messages during network disconnects or slow consumers and explore patterns like Redis Streams and list-based queues for guaranteed delivery.
Redis Pub/Sub Message Loss: Causes and Reliable Alternatives
I remember the first time Redis Pub/Sub burned me. It was late, around 11 PM, and our notification system started dropping messages. Not all of them — just enough that users noticed before we did. The on-call engineer (me, unfortunately) spent two hours digging through application logs before the obvious truth surfaced: Redis Pub/Sub doesn't queue anything. It's not a message broker. It's a firehose, and if you're not standing directly in front of it with your mouth open, you're going to miss something.
That's the thing nobody tells you when you first reach for Redis Pub/Sub. It's right there in the documentation, technically, but it's easy to gloss over when you're excited about how simple the API is. You publish on one end, you subscribe on the other, and it works. Until it doesn't.
The fire-and-forget reality
Redis Pub/Sub operates on a brutally simple principle: when you publish a message, Redis pushes it to every connected subscriber in that channel at that exact moment. If a subscriber isn't connected, or if it's connected but can't keep up, the message evaporates. There's no persistence layer, no acknowledgment mechanism, no dead letter queue. The message exists only in transit.
Let me give you a concrete example. Say you've got a service that publishes order status updates, and another service that subscribes to send confirmation emails. Under normal load, everything hums along. Then your email service hiccups — maybe the SMTP relay is slow, maybe there's a garbage collection pause. During that hiccup, Redis keeps pushing messages. The subscriber's TCP buffer fills up. Eventually, the connection drops. When the subscriber reconnects, it picks up from now, not from where it left off. Every message published during the disconnect window is gone.
I've measured this in practice with a simple test setup: a publisher firing 10,000 messages per second, and a subscriber that occasionally blocks for 50 milliseconds. Even with a single brief pause, you'll lose dozens of messages. The subscriber never knows they were sent. The publisher never knows they were lost. Redis is perfectly happy — it did exactly what it was designed to do.
What actually causes message loss
There are three main scenarios where Pub/Sub drops messages, and they're all worth understanding because they'll show up in different ways.
Network instability is the most obvious one. Any temporary network partition between the subscriber and Redis severs the connection. Redis detects this via the client timeout (default 60 seconds, but you might have it set lower). During that window, all published messages are lost to that subscriber. Other subscribers might get them fine, which makes debugging extra fun — you'll see inconsistent state across services and wonder if you're going crazy.
Slow consumers are more insidious because the connection stays open. Redis uses a push model, meaning it writes to subscriber sockets as fast as publishers produce. If a subscriber can't process messages fast enough, the kernel's TCP receive buffer fills up. Once that buffer is full, Redis can't write more data, and the connection eventually fails. The subscriber might not even notice it's behind until the disconnect happens.
I've seen this play out with subscribers that do synchronous database writes for each message. At low volume, it's fine. At peak, the database becomes the bottleneck, the subscriber falls behind, and messages pile up in the TCP buffer. When that buffer overflows, the connection resets, and the subscriber loses everything it hadn't yet read from the socket.
Client disconnections during deployments or restarts are the third big category. If you're doing rolling deployments and a subscriber instance goes down, it misses everything published during its absence. There's no "catch me up" mechanism. When it comes back online, it starts fresh.
One thing that surprised me: even a clean shutdown doesn't help. If your subscriber gracefully unsubscribes before exiting, it still misses messages published between the unsubscribe and when it comes back. The unsubscribe is instantaneous — there's no "hold my messages for a minute" option.
When Pub/Sub is actually fine
I don't want to make it sound like Redis Pub/Sub is useless. It's excellent for specific use cases, and I still use it regularly. The key is understanding what those use cases are.
Real-time notifications where occasional loss is acceptable work beautifully. Think live sports scores, stock tickers, or typing indicators in a chat app. If a user misses a score update, the next one comes along in a few seconds anyway. The data has a short shelf life and no durability requirement.
Service discovery and configuration broadcasting are another sweet spot. When you change a feature flag and publish to all application instances, it's okay if an instance that's currently restarting misses the update — it'll pick up the current state when it comes back online or on the next periodic refresh.
I've also used Pub/Sub successfully for cache invalidation across multiple application servers. Publish a cache key to invalidate, and every server clears its local cache. If one server misses the message, the worst case is it serves stale data until the cache entry expires naturally. Not ideal, but not catastrophic either.
The common thread here: Pub/Sub works when messages are ephemeral by nature, when loss is recoverable through other mechanisms, and when you don't need ordering guarantees or exactly-once delivery.
Redis Streams: the built-in alternative
Redis Streams, introduced in Redis 5.0, is what I reach for now when I need reliable message delivery. It's not Pub/Sub with persistence bolted on — it's a fundamentally different model, closer to a distributed log like Kafka than a broadcast mechanism.
With Streams, messages are appended to a log and stay there until explicitly acknowledged. Consumers can disconnect, restart, fall behind, and still catch up. The stream retains messages based on either a maximum length or a retention period, so you control how much history to keep.
Here's how the mental model differs. In Pub/Sub, you subscribe to a channel and messages flow to you. In Streams, you pull messages at your own pace. A consumer group tracks which messages each consumer has acknowledged, so you can have multiple consumers reading from the same stream without duplication (or with intentional duplication, if you want fan-out).
A basic Streams setup looks something like this:
XADD orders * status confirmed order_id 12345
That appends a message to the orders stream. The * tells Redis to auto-generate an ID. Then your consumer reads with:
XREADGROUP GROUP email-processor worker-1 COUNT 10 STREAMS orders >
The > means "give me messages that haven't been delivered to any consumer in this group yet." After processing, the consumer acknowledges:
XACK orders email-processor <message-id>
If the consumer crashes before acknowledging, the message stays pending. Another consumer in the group can claim it with XCLAIM after a timeout. This is the acknowledgment and redelivery mechanism that Pub/Sub completely lacks.
The consumer group model in practice
Consumer groups are what make Streams genuinely useful for reliable processing. Each group maintains its own position in the stream, so you can have one group for email notifications, another for analytics, and another for audit logging — all reading the same stream independently.
Within a group, messages are distributed across consumers. This gives you horizontal scalability: add more consumer instances, and they'll share the load. If one instance dies, its pending messages become available for other instances to claim.
I've found that the pending entries list is invaluable for monitoring. You can run XPENDING to see which messages haven't been acknowledged and how long they've been outstanding. This surfaces slow consumers immediately — much better than discovering message loss days later through user complaints.
One gotcha with Streams: message IDs are monotonically increasing timestamps, which means you can't easily insert messages out of order. If you need strict ordering within a stream, this is actually a feature. If you need to prioritize certain messages, you'll need multiple streams or a different approach.
List-based queues for simpler needs
Before Streams existed, the standard pattern for reliable messaging with Redis was list-based queues with blocking pops. This pattern is still perfectly viable, especially if you're on an older Redis version or want something dead simple.
The idea is straightforward: producers LPUSH or RPUSH messages onto a list, and consumers do BLPOP or BRPOP to block until a message arrives. The blocking pop is crucial — without it, you'd be polling, which wastes CPU and adds latency.
The reliability comes from a secondary "processing" list. The consumer atomically moves a message from the pending queue to a processing queue using BRPOPLPUSH (or LMOVE in Redis 6.2+). After processing, it removes the message from the processing queue. If the consumer crashes, the processing queue retains the message, and a monitor process can move stale items back to the pending queue.
I've built this pattern several times, and it works, but it's more code than you'd expect. You need to handle timeouts, decide how long a message can sit in the processing queue before you consider it abandoned, and deal with edge cases around duplicate processing. Streams essentially formalize all of this, which is why I've mostly moved away from hand-rolled list queues.
The one place I still use list-based queues is for work queues where processing order doesn't matter and I want the absolute simplest implementation possible. Sometimes a list and a BLPOP loop is all you need, and adding Streams would be overengineering.
Pub/Sub sharding in Redis 7
Redis 7 introduced sharded Pub/Sub, which is worth mentioning because it solves a different problem than message loss. With regular Pub/Sub, every message is broadcast to every node in a cluster, even if no subscriber on a given node cares about that channel. This wastes cluster interconnect bandwidth.
Sharded Pub/Sub ties channels to specific cluster slots, so messages only propagate to nodes that actually have subscribers for that channel. It's a performance optimization, not a reliability feature. You'll still lose messages on disconnect. But if you're running Pub/Sub at scale in a clustered environment, it's worth knowing about.
Making the choice: Pub/Sub vs Streams vs lists
After living with these patterns for years, my decision process has simplified to a few questions.
First: can you tolerate message loss? If yes, and if the data is ephemeral, Pub/Sub is probably fine. You'll get the lowest latency and the simplest operational model.
Second: do you need message persistence and replay? If yes, Streams is the answer. The ability to reprocess messages after a consumer bug fix has saved me more than once. With Pub/Sub, if your consumer had a bug that caused it to mishandle messages for an hour, those messages are gone forever. With Streams, you can reset the consumer group position and replay them.
Third: do you need multiple independent consumer groups reading the same data? Streams handles this natively. With Pub/Sub, every subscriber gets every message, which might be what you want, but there's no way to have different groups of subscribers maintaining independent positions.
Fourth: what's your Redis version? If you're stuck on something older than 5.0, Streams isn't available, and you're looking at list-based queues or an external message broker. I've been in this situation, and honestly, if you need reliable messaging and can't use Streams, I'd consider whether Redis is the right tool at all. RabbitMQ or NATS might be better fits.
The operational side nobody talks about
Here's something I learned the hard way: monitoring Pub/Sub is deceptively difficult. You can monitor connection counts and channel subscriptions with PUBSUB NUMSUB, but you can't see how many messages are being lost. There's no metric for "messages published but not received" because Redis doesn't track that.
With Streams, you get visibility. XINFO GROUPS shows you consumer lag. XPENDING shows you unacknowledged messages. You can set up alerts when lag exceeds a threshold. This operational visibility alone has made Streams worth the switch for me.
Memory management is another consideration. Pub/Sub messages exist only in memory and only while in flight, so memory usage is bounded by your publish rate and consumer speed. Streams store messages until they're trimmed, so you need to think about retention policies. I typically set a maximum stream length (MAXLEN) based on expected throughput and available memory, and I monitor stream length to catch unexpected buildups.
What I actually do now
These days, I default to Redis Streams for any new messaging use case that requires reliability. The API is slightly more complex than Pub/Sub, but not by much, and the reliability guarantees are worth it. I keep Pub/Sub around for the ephemeral stuff — cache invalidation, real-time presence, that kind of thing.
For particularly critical messaging (payment processing, order fulfillment), I've moved away from Redis entirely and use dedicated message brokers. Redis is fantastic at many things, but it's not optimized for disk-based persistence of high-volume message queues. If you need messages to survive a full Redis restart with zero loss, you need to configure AOF persistence with appendfsync always, which tanks write performance. At that point, something like Kafka or Pulsar makes more sense.
But for the vast middle ground — where message loss would be annoying or costly but not catastrophic, and where you want to stay within the Redis ecosystem you already know — Streams hits a sweet spot. It's been reliable enough for me in production, and the operational simplicity of not introducing a new infrastructure component has real value.
The original mistake I made with Pub/Sub wasn't really about the technology. It was about not reading the fine print, about assuming that "messaging" implied "message delivery guarantees." Redis Pub/Sub makes no such guarantees, and it doesn't pretend to. Once you understand that, you can use it appropriately and reach for Streams when you need more.