Troubleshooting Common RabbitMQ Configuration Issues

Find and fix RabbitMQ exchange, queue, binding, acknowledgement, and permission mistakes without chasing false leads.

Troubleshooting Common RabbitMQ Configuration Issues

Most RabbitMQ configuration problems look like application bugs at first. A publisher says it sent the message. A consumer says it never saw it. The queue graph is empty, or worse, it is full and nobody knows why. The fastest way out is to stop guessing and follow the message path: publisher, exchange, binding, queue, consumer, acknowledgement.

RabbitMQ is strict about topology. A direct exchange does not "mostly" match a routing key. A queue declared as exclusive will not behave like a shared work queue. A message published as mandatory can be returned, while the same unroutable message without mandatory may simply be dropped by the exchange. These details are small until they cost you an afternoon.

Start with the actual route

For a normal AMQP publish, the producer sends a message to an exchange with a routing key. The exchange uses its type and bindings to decide which queues should receive the message. Consumers then pull deliveries from queues and acknowledge them after processing.

When a message disappears, ask four questions:

  • Did the producer publish to the exchange and virtual host you think it did?
  • Does that exchange exist, and is it the type you think it is?
  • Is there a binding from that exchange to the intended queue?
  • Does the routing key match that binding for the exchange type?

That sounds basic, but it catches a lot of real incidents. Staging and production often use different virtual hosts. A deployment script may declare orders.created in one environment and order.created in another. A queue may be bound to a topic pattern that misses one extra word.

Use the management UI or CLI to inspect the live broker, not the code you hope is running:

rabbitmqctl list_exchanges name type durable auto_delete
rabbitmqctl list_bindings source_name source_kind destination_name destination_kind routing_key
rabbitmqctl list_queues name durable auto_delete exclusive messages_ready messages_unacknowledged consumers

If you use multiple virtual hosts, include -p:

rabbitmqctl -p production list_bindings source_name destination_name routing_key

Routing-key mismatches

Direct exchanges require an exact binding-key match. If a queue is bound with invoice.created, a message published with invoices.created will not arrive. RabbitMQ will not correct pluralization, case, dots, or dashes.

Topic exchanges use * for one word and # for zero or more words. The word separator is a dot. A binding of logs.* matches logs.info, but not logs.app.info. A binding of logs.# matches both.

A useful troubleshooting trick is to add a temporary diagnostic queue with a broad binding, then publish a known test message:

rabbitmqadmin declare queue name=debug.routing durable=false auto_delete=true
rabbitmqadmin declare binding source=events destination=debug.routing destination_type=queue routing_key='#'

Do this carefully in production, and remove the diagnostic binding when finished. The goal is to prove whether messages are reaching the exchange at all.

For important publishers, enable publisher returns with the mandatory flag so unroutable messages are visible to the publisher. Publisher confirms tell you the broker accepted the publish; returns tell you the exchange could not route it to any queue. They answer different questions.

Exchange type mistakes

Exchange type changes are a common source of confusion because declaring an existing exchange with different properties fails. If one service declares events as topic and another declares events as direct, the second declaration should get a precondition failure.

That failure is good. It prevents two applications from silently disagreeing about routing. The fix is not to catch and ignore the exception. The fix is to make topology ownership clear. Usually one deployment step or infrastructure module should declare shared exchanges and queues, while applications only declare private reply queues or idempotently assert the expected topology.

Fanout exchanges ignore routing keys. Headers exchanges route by headers, not routing keys. If your test message has the right routing key but no queue receives it, check the exchange type before editing every binding.

Queue properties that surprise people

Durable means the queue definition survives a broker restart. It does not make every message inside the queue survive. For messages to survive a restart, the queue must be durable and the message must be published as persistent. Even then, publishers should use confirms if they need to know when RabbitMQ has accepted the message safely.

Auto-delete queues are removed after their last consumer is gone. They are useful for temporary subscriptions, but they are a bad fit for shared work queues. Exclusive queues are scoped to the connection that declares them and disappear when that connection closes. They are useful for reply queues and private consumers, not for multiple worker instances.

If a queue seems to "randomly vanish," check these flags:

rabbitmqctl list_queues name durable auto_delete exclusive consumers

Also check whether application code declares the queue on startup with different arguments than the existing queue. RabbitMQ treats queue arguments such as queue type, dead-letter exchange, max length, and some durability-related settings as part of the declaration contract. A mismatch can close the channel with a precondition failure.

Messages are ready, but consumers do nothing

If messages_ready is high and consumers is zero, RabbitMQ is waiting. The consumer application may be down, connected to the wrong virtual host, using the wrong queue name, or blocked by permissions.

If consumers are connected but deliveries are not happening, check prefetch and consumer capacity:

rabbitmqctl list_consumers queue_name channel_pid consumer_tag ack_required prefetch_count active

A consumer with manual acknowledgements and a full prefetch window will not receive more messages until it acks or nacks some of the messages it already has. This often looks like RabbitMQ stopped delivering, when the consumer is actually holding unacknowledged work.

If messages_unacknowledged is high, look at the consumer logs and downstream systems. A slow database, a stuck HTTP dependency, or a handler that catches exceptions without acking can all create a wall of unacked messages.

Acknowledgement bugs

Manual acknowledgements are the normal choice for reliable processing. The consumer should ack only after the work is complete. If it fails, it should reject or nack with a deliberate requeue decision.

The dangerous pattern is auto_ack=true for work that can fail. With automatic acknowledgements, RabbitMQ considers the message handled as soon as it is delivered. If the consumer crashes after receiving it, the message is gone from the queue.

The opposite bug is never acking. The consumer processes the message successfully, maybe even writes to a database, but forgets basic_ack. RabbitMQ keeps the delivery unacked until the channel closes, then redelivers it. That creates duplicate work and growing unacked counts.

A simple handler shape is easier to audit:

def handle(ch, method, properties, body):
    try:
        process(body)
    except RetryableError:
        ch.basic_nack(method.delivery_tag, requeue=True)
    except Exception:
        ch.basic_nack(method.delivery_tag, requeue=False)
    else:
        ch.basic_ack(method.delivery_tag)

If you requeue every failure forever, one bad message can loop endlessly. Use a dead-letter exchange or retry design for poison messages.

Permissions and virtual hosts

RabbitMQ permissions are scoped per virtual host. A user may be able to connect but still lack configure, write, or read permissions for a queue or exchange. That can show up as a channel exception in the client logs, not always as a friendly application error.

Check permissions directly:

rabbitmqctl list_permissions -p production
rabbitmqctl list_user_permissions app_user

For a service that only publishes, grant write permissions to the exchange pattern it needs and avoid broad configure rights. For a consumer, grant read on the queue and write if it needs to publish nacks to dead-letter paths or use reply patterns. Overly broad permissions make troubleshooting easier today and security reviews harder tomorrow.

Dead-letter configuration mistakes

Dead-letter exchanges are supposed to make failures visible. Misconfigured dead-lettering does the opposite: messages fail, get rejected, and then vanish into an exchange that has no binding.

Check the queue arguments, not just the queue name:

rabbitmqctl list_queues name arguments

For a queue that should dead-letter failed jobs, you should see arguments such as x-dead-letter-exchange and, sometimes, x-dead-letter-routing-key. Then inspect that exchange and its bindings the same way you inspect the main route.

A common mistake is to configure a dead-letter exchange called jobs.dlx but bind the dead-letter queue to jobs.failed on a different exchange. Another is to set x-dead-letter-routing-key to a value that no binding matches. RabbitMQ will route the dead-lettered message through the dead-letter exchange like any other publish. If nothing matches, the message has nowhere useful to go.

Retry queues need the same care. If you build retry with TTL plus dead-lettering, draw the route on paper:

main queue -> reject -> retry exchange -> retry queue -> TTL expires -> main exchange -> main queue

Then verify every exchange, queue, binding, and routing key. Retry loops are easy to create by accident. Put a cap on attempts in message headers or application state so one broken payload does not spin forever.

Policy surprises

Policies can change queue behavior without the application code mentioning it. A policy may set queue type, maximum length, TTL, dead-letter exchange, or other optional arguments. That is useful for operations, but it can confuse debugging when a queue behaves differently from the code declaration.

List policies during troubleshooting:

rabbitmqctl list_policies

Look at the pattern and priority. A broad policy like .* can affect queues created later by unrelated teams. If a queue is dropping older messages, check for max-length or overflow settings. If messages expire sooner than expected, check for queue-level TTL and per-message expiration.

When the application declares one set of arguments and a policy applies another, RabbitMQ's rules depend on the setting. Some optional arguments can be controlled by policy; others must match the declaration. The safe operational habit is to keep queue behavior in one obvious place and document any policy that intentionally overrides application defaults.

When producers and consumers declare topology

Many client libraries make it easy for every service to declare exchanges, queues, and bindings on startup. That can be convenient in development. In production, it can create ownership problems.

If both producer and consumer declare the same queue, they must agree on every important property. If one deployment changes a queue from auto-delete to durable, or changes a dead-letter argument, the next service to start may fail with a precondition error. That is better than silent drift, but it can still break a deploy.

For shared topology, prefer one owner: Terraform, Ansible, a migration job, or one clearly responsible service. Application startup can still assert that the expected topology exists, but it should not casually create shared queues with defaults that nobody reviewed.

Private topology is different. A service creating a temporary reply queue or an exclusive subscription queue can own that queue directly. The difference is whether another service depends on the queue name and behavior.

Keep one known-good publish path

For important systems, keep a tiny diagnostic publisher or runbook command that sends a harmless message through the expected exchange, routing key, and virtual host. It should use the same credentials class as the real application, or at least a permission set close enough to catch routing and access problems.

That known-good path is useful during deployments. If the diagnostic message routes but the application message does not, compare the actual routing key, headers, and virtual host from the app. If the diagnostic message fails too, the problem is probably topology, permissions, or broker state.

A practical incident checklist

When messages are missing or stuck, collect the live state before restarting everything:

rabbitmqctl -p production list_queues name messages_ready messages_unacknowledged consumers state
rabbitmqctl -p production list_bindings source_name destination_name routing_key
rabbitmqctl -p production list_exchanges name type durable
rabbitmqctl -p production list_connections name user vhost state

Then send one known test message with a unique ID and trace it through logs. Do not test with a random production message whose path is already unclear.

Most RabbitMQ configuration issues are not mysterious once you line up the declared topology with the publisher and consumer behavior. The broker is usually doing exactly what it was told. The work is finding the place where what it was told differs from what the team meant.