Step-by-Step Guide to Deploying a RabbitMQ Active-Passive Cluster

Build a RabbitMQ active-passive setup with clustering, matching Erlang cookies, quorum queues, and a tested failover path.

Step-by-Step Guide to Deploying a RabbitMQ Active-Passive Cluster

RabbitMQ high availability needs more than two servers that can see each other. You need clustering for shared metadata, replicated queues for message availability, and a clear failover path for clients.

This guide shows the RabbitMQ side of an active-passive style deployment. Client failover usually comes from a load balancer, DNS change, service discovery, or a virtual IP managed outside RabbitMQ.

Prerequisites for an Active-Passive Cluster

Before beginning the configuration, ensure the following prerequisites are met across all intended cluster nodes (Node A - Active, Node B - Passive):

  1. Compatible Software Versions: Keep RabbitMQ Server and Erlang/OTP versions aligned across nodes. In practice, run the same RabbitMQ version on every node unless you are following RabbitMQ's documented rolling upgrade path.
  2. Network Accessibility: Nodes must communicate over AMQP ports used by clients, the distribution port used for clustering, and any management or TLS ports you enable.
  3. Host Resolution: Configure the /etc/hosts file (or DNS) on all nodes so that each node can resolve the hostname of all other nodes reliably.
  4. Cookie Consistency: The Erlang 'magic cookie' must be identical on all nodes. This is crucial for the nodes to trust each other for clustering.

Establishing Cookie Consistency

The Erlang cookie determines whether nodes can communicate securely. It must be copied from the first node initialized to all others.

On Node A (The first node):

Locate the cookie file (usually /var/lib/rabbitmq/.erlang.cookie or ~/.erlang.cookie depending on the installation method) and copy its contents.

On Node B (and subsequent nodes):

  1. Stop the RabbitMQ service:
    sudo systemctl stop rabbitmq-server
    
  2. Replace the existing cookie file with the content copied from Node A, ensuring correct permissions (usually 400).
    # Example using echo (replace content as needed)
    echo "YOUR_LONG_COOKIE_STRING" | sudo tee /var/lib/rabbitmq/.erlang.cookie
    sudo chmod 400 /var/lib/rabbitmq/.erlang.cookie
    
  3. Start the service on Node B:
    sudo systemctl start rabbitmq-server
    

Step 1: Configuring Hostnames and Networking

Ensure that the host files on both Node A and Node B correctly map their hostnames.

Example /etc/hosts (on both servers):

192.168.1.10   rabbitmq-node-a
192.168.1.11   rabbitmq-node-b

Step 2: Initializing the First Cluster Node (Active)

Node A will be the initial primary node, where the cluster is first established.

  1. Start the service on Node A (if not already running):
    sudo systemctl start rabbitmq-server
    
  2. Verify Status: Ensure the node is running correctly.
    rabbitmqctl status
    

Step 3: Joining the Second Node (Passive) to the Cluster

Now, we instruct Node B to join the cluster led by Node A.

  1. Stop the RabbitMQ application on Node B while keeping the Erlang node available:

    sudo rabbitmqctl stop_app
    
  2. Reset Node B's local state if it has already been initialized as a standalone node:

    sudo rabbitmqctl reset
    
  3. Join Command: Execute the join command on Node B, specifying the hostname of Node A as the peer.

    sudo rabbitmqctl join_cluster rabbit@rabbitmq-node-a
    

    Tip: Use the hostname defined in /etc/hosts.

  4. Start the RabbitMQ application on Node B:

    sudo rabbitmqctl start_app
    

Step 4: Verifying Cluster Formation

Log into Node A and verify that both nodes recognize each other.

rabbitmqctl cluster_status

Expected Output Snippet:

You should see both rabbitmq-node-a and rabbitmq-node-b listed under running_nodes.

Cluster status of node rabbit@rabbitmq-node-a ...
[{nodes,[{disc,[rabbit@rabbitmq-node-a,rabbit@rabbitmq-node-b]}]},
 {running_nodes,[rabbit@rabbitmq-node-a,rabbit@rabbitmq-node-b]},
 ...
]

Step 5: Configuring High Availability for Queues

Standard RabbitMQ clustering shares metadata such as users, exchanges, bindings, and policies. Queue contents need a replicated queue type if you want messages to survive node failure.

For modern RabbitMQ deployments, use quorum queues for replicated durable queues. Classic mirrored queues used ha-mode policies in older RabbitMQ releases, but that approach is deprecated and removed from newer major versions.

Declare a Quorum Queue

You can declare quorum queues from your application or with rabbitmqadmin. This example creates a durable quorum queue:

rabbitmqadmin declare queue name=orders durable=true arguments='{"x-queue-type":"quorum"}'

For two-node labs, a quorum queue can run, but it cannot tolerate the loss of one node and still keep a majority. For production, use at least three RabbitMQ nodes for quorum queues so one node can fail while the queue still has a majority.

Step 6: Test Failover

Before calling the cluster ready, test the path your clients will use:

  1. Publish a few persistent test messages to a quorum queue.
  2. Stop the active node's RabbitMQ application with sudo rabbitmqctl stop_app.
  3. Confirm clients reconnect through your load balancer, DNS target, or service discovery setup.
  4. Consume the test messages from the surviving node.
  5. Start the stopped application again with sudo rabbitmqctl start_app and check rabbitmqctl cluster_status.

Final Takeaway

RabbitMQ clustering gives you shared broker metadata, but queue availability depends on the queue type and client failover design. Use quorum queues for replicated durable queues, keep at least three nodes for real fault tolerance, and test failover with the same connection path your applications use.