Understanding and Executing PostgreSQL Failover vs. Switchover Scenarios

Learn when to use PostgreSQL switchover or failover, how to check replica safety, and how to avoid split-brain during HA events.

Understanding and Executing PostgreSQL Failover vs. Switchover Scenarios

The difference between PostgreSQL failover and switchover is not academic when you are the person holding the pager. A switchover is planned: you still have a healthy primary, you can drain writes, and you can choose the best moment to move the writable role to a standby. A failover is what you do when the primary is gone, unreachable, or unsafe to keep serving traffic.

That one difference changes everything. During a switchover, your main job is patience: prove the standby has caught up before promotion. During a failover, your main job is containment: make sure the old primary cannot keep accepting writes after a standby is promoted. Most ugly PostgreSQL HA incidents come from rushing one of those two jobs.

Replication Fundamentals: The Foundation of HA

PostgreSQL High Availability is built upon streaming replication, where one server acts as the Primary (or Master) and one or more servers act as Standbys (or Replicas). The Primary streams write-ahead log (WAL) records to the Standbys to keep them in sync.

To manage these roles effectively, specific configuration settings are necessary on both primary and replica nodes:

Critical Configuration Settings

These settings govern how replication operates and how nodes identify each other:

  • wal_level: Must be set to replica or higher (ideally logical if using tools that require logical decoding) on the Primary.
  • max_wal_senders: Defines the maximum number of concurrent WAL sender connections. Size it for all physical standbys, base backups, and replication tools that may connect at the same time.
  • hot_standby: Must be set to on in the standby server's postgresql.conf to allow read-only queries during replication.
  • synchronous_commit: Controls when a transaction is acknowledged. It only provides stronger durability with synchronous replication correctly configured; by itself, it does not make a standby current.
  • primary_conninfo: Set on the standby, detailing connection information (host, port, user, password) to connect to the current Primary.

Best Practice: Put a stable endpoint in front of PostgreSQL, such as HAProxy, PgBouncer behind a virtual IP, a service discovery record, or your platform's service abstraction. Applications should not need to know which node is primary today.

Switchover: The Planned Transition

A Switchover is a controlled, graceful process where the active Primary node is intentionally decommissioned, and a designated Standby is promoted to take its place. This procedure is typically used for planned maintenance, version upgrades, or hardware replacements.

Steps for a Controlled Switchover

The goal of a switchover is to ensure zero data loss by waiting for all in-flight transactions to be replicated before promotion.

  1. Stop Writes on the Current Primary: The first step is to prevent any new transactions from being committed on the current Primary. This is often achieved by setting default_transaction_read_only = on or temporarily shutting down client connections.
  2. Wait for Replication Catch-up: Ensure the designated Standby has received and applied all remaining WAL records from the Primary. You can check the replication lag using pg_stat_replication on the Primary or by examining the standby's recovery status.
  3. Initiate Standby Promotion: Execute the command to promote the chosen Standby server to Primary role. The specific command depends on the management tool used (e.g., pg_ctl promote or a cluster manager command).
  4. Reconfigure Old Primary: Once the Standby is successfully promoted, the old Primary must be reconfigured to follow the new Primary as a Standby. This involves updating its primary_conninfo.
  5. Redirect Applications: Update the load balancer or connection pooler to direct traffic to the new Primary server.

A practical switchover checklist usually looks more ordinary than dramatic. Announce a short write pause, stop background workers that keep writing, put the application in maintenance mode or drain the writer pool, and then check the replication position. On the old primary, pg_stat_replication shows whether the standby has received and flushed WAL. On the standby, pg_last_wal_receive_lsn() and pg_last_wal_replay_lsn() help you see whether WAL has merely arrived or has actually been replayed.

Do not promote a standby just because it is connected. A standby can be connected and still seconds or minutes behind if it is replaying a large transaction, waiting on disk I/O, or recovering after a network pause. For a planned switchover, you want replay caught up before promotion. If read-only sessions are running on the standby, also check whether long-running queries are delaying WAL replay.

After promotion, test the role directly:

SELECT pg_is_in_recovery();

The promoted node should return false. The demoted node, after it has been rebuilt or rewired as a standby, should return true.

The application side deserves the same care. Before the switchover, know how clients discover the writer. If they connect to a DNS name, understand the DNS TTL and whether clients cache addresses longer than expected. If they connect through PgBouncer, decide whether you need to pause the pool, reload it, or restart it. If you use HAProxy, make sure the health check tests writable status, not just whether port 5432 is open. A standby with PostgreSQL running is not a valid write target.

I also like to write down the rollback point. Before promotion, you can usually stop, reopen writes on the old primary, and try again later. After promotion, rollback becomes a new role change, not a simple undo. That does not mean promotion is dangerous; it means the operator should know which side of the line they are on.

Failover: The Emergency Response

Failover is an immediate, reactive procedure triggered when the current Primary server fails unexpectedly (e.g., hardware crash, network partition, software error) and cannot be brought back online quickly.

Failover inherently carries a higher risk of data loss because there is no guarantee that the last few committed transactions had time to stream to the Standbys before the failure occurred.

Executing an Emergency Failover

Failover procedures are designed for speed and recovery, often utilizing specialized tooling to automate the promotion.

  1. Determine the Health of the Old Primary: Verify that the original Primary is truly unavailable and not just experiencing a transient network issue (this prevents dangerous 'split-brain' scenarios).
  2. Select the Best Standby: Choose the Standby with the least replication lag (the one that is furthest ahead in the WAL stream).
  3. Promote the Standby: Immediately promote the selected Standby using the promotion command (pg_ctl promote).
  4. Handle Data Loss (If Necessary): If the cluster utilizes asynchronous replication, the data lost on the failed Primary might need to be manually reconciled or simply accepted, depending on the application's tolerance.
  5. Reconfigure Former Primary: Once the original Primary is recovered, it must be cleaned, reinitialized (often requiring a base backup from the new Primary), and configured to follow the new Primary.

The hard part of failover is not typing pg_ctl promote. The hard part is deciding that the old primary must be treated as unsafe until proven otherwise. If the old primary is still running but cut off from the application or from the standby, you can get split-brain: two writable PostgreSQL servers accepting different histories. Once that happens, PostgreSQL will not merge the histories for you. You are looking at manual data reconciliation or restoring one side from backup.

In a real incident, I would rather spend one extra minute fencing the old primary than spend the next day explaining why two order records disagree. Fencing can mean powering off the old VM, detaching its network interface, disabling the writer endpoint, or using a cloud/provider mechanism that guarantees the old host cannot receive writes. The exact method depends on your infrastructure, but the requirement is simple: before clients write to the new primary, the old primary must not be writable by those clients.

After failover, expect cleanup. If the old primary comes back, do not casually point it at the new primary and hope it catches up. It may contain WAL that belongs to the old timeline. In many environments the safest path is pg_rewind if the prerequisites are met, or a fresh base backup from the new primary if they are not.

One detail that gets missed during emergency work is the replication slot story. If the old primary used physical replication slots for standbys, those slots do not magically move with the promoted standby unless your HA tooling manages them. After failover, check whether the new primary has the slots your surviving standbys need, and check whether any abandoned slot is retaining WAL forever. A forgotten slot can fill a disk hours after the visible outage is over.

Use the same discipline for backups. Once the cluster has a new primary, confirm that backups and WAL archiving now follow that primary. A failover that restores service but silently stops backups is only half a recovery.

Tools for Safe Promotion: Repmgr vs. Patroni

While manual promotion using pg_ctl is possible, robust HA environments rely on dedicated tools to manage the complex choreography required for failover and switchover, automatically handling configuration changes and cluster state management.

Repmgr (Replication Manager)

repmgr is a lightweight tool that helps register nodes, monitor replication, and perform controlled role changes. Exact commands depend on the version and cluster layout, but the common pattern is:

  • Switchover: Run a planned repmgr standby switchover from the standby that should become primary, after confirming replication health.
  • Failover: Let repmgrd perform automatic failover only if fencing and witness/quorum behavior are understood and tested.

Patroni

Patroni utilizes Distributed Consensus Stores (like etcd, ZooKeeper, or Consul) to manage cluster state, automatically electing a new Primary upon failure detection. Patroni largely automates both switchovers and failovers through API calls or Kubernetes operators, drastically reducing manual intervention.

Example using Patroni (Conceptual Promotion Command):

# Triggering a switchover via Patroni's REST API
curl -X POST http://patroni-api-endpoint/switchover -H "Content-Type: application/json" -d '{"target": "standby_node_name"}'

Warning on Split-Brain: The greatest danger during automated failover is the 'split-brain' scenario, where two nodes mistakenly believe they are the Primary due to network partitioning. Tools like Patroni mitigate this using quorum mechanisms, while manual setups require strict fencing mechanisms (like power controls) to ensure only one Primary exists.

Summary of Differences

Feature Switchover (Planned) Failover (Emergency)
Trigger Maintenance, upgrade, administrative choice Primary failure (crash, outage)
Data Loss Risk Near Zero (if properly timed) Medium to High (depends on replication mode)
Downtime Expectation Short, controlled downtime Immediate, reactive downtime
Preparation Requires prior coordination and WAL sync confirmation Requires immediate action and reliance on Standby health

A Small Runbook You Can Adapt

For a planned switchover, a compact runbook might read like this:

  1. Confirm the chosen standby is healthy and replaying WAL.
  2. Pause application writes and background jobs.
  3. Confirm replication replay has caught up.
  4. Promote the standby through the HA tool.
  5. Move the writer endpoint.
  6. Confirm pg_is_in_recovery() is false on the new primary.
  7. Rebuild or rewind the old primary as a standby.
  8. Resume writes and watch errors, replication, and connection counts.

For failover, the order changes:

  1. Confirm the primary is failed or unsafe.
  2. Fence the old primary.
  3. Choose the most advanced standby.
  4. Promote it through the HA tool.
  5. Move the writer endpoint once.
  6. Confirm writes work on the new primary.
  7. Verify replicas, slots, backups, and WAL archiving.
  8. Reintroduce the old primary only through rewind or rebuild.

The commands vary by tooling, but the safety properties do not. One writable primary, known replication state, tested client routing, and a clean way to bring failed nodes back.

Before you trust any HA design, rehearse both paths in a non-production environment. A switchover drill should prove that applications reconnect cleanly, the old primary can become a standby again, and monitoring follows the new role. A failover drill should prove something stricter: the failed primary is fenced, the standby chosen for promotion is the best available candidate, the application writer endpoint moves once, and the old primary cannot rejoin without rewind or rebuild.

The safest PostgreSQL HA teams treat failover as a tested operational workflow, not a heroic command typed during an outage.