Setting Up Synchronous Replication for High Availability in PostgreSQL

Configuring PostgreSQL for high availability (HA) is crucial for any mission-critical application. While asynchronous replication offers performance benefits, it inherently carries a risk of data loss if the primary server fails before changes are fully transmitted to the standby. Synchronous streaming replication solves this by guaranteeing that a transaction is only considered committed once the data has been successfully written to the WAL (Write-Ahead Log) of both the primary and at least one designated standby server. This ensures a Recovery Point Objective (RPO) of zero.

This comprehensive guide walks you through the essential configuration steps required to establish a robust, zero-data-loss PostgreSQL replication setup. We will focus on the critical parameters like wal_level and synchronous_commit that underpin this high-availability architecture.

Prerequisites

Before starting, ensure you have two PostgreSQL servers set up (Primary and Standby) running identical major versions of PostgreSQL. Both servers must have network connectivity. For this guide, we assume:

Primary Hostname/IP: pg_primary
Standby Hostname/IP: pg_standby
Replication User: repl_user (created with appropriate permissions)
Database Name: mydb

Step 1: Configuring the Primary Server

The primary server requires specific settings to enable streaming replication and manage the Write-Ahead Log (WAL) required by synchronous commits.

A. Adjusting `postgresql.conf` on the Primary

Edit the primary server's postgresql.conf file. The following parameters are mandatory for streaming replication:

# --- Required for Replication ---
listen_addresses = '*'         # Allows connections from standby
wal_level = replica            # Must be 'replica' or higher (e.g., 'logical')
max_wal_senders = 10           # Max concurrent connections from standbys
max_replication_slots = 10     # Slots needed for persistent replication streams

# --- Essential for Synchronous Commit ---
synchronous_standby_names = '1 (standby_app_name)' # Specifies required standbys

# --- Optional but Recommended ---
wal_log_hints = on             # Recommended for safer replication, though it increases WAL volume
shared_preload_libraries = 'pg_stat_statements' # If using monitoring

Explanation of Key Parameters:

wal_level = replica: This ensures that sufficient information is written to the WAL to allow a standby server to reconstruct the database state. For synchronous commits, this level is the minimum requirement.
synchronous_standby_names: This is the core setting for defining which standbys must acknowledge writes. We define it using the (N (standby_name)) syntax. If N=1, at least one standby must confirm the write before the transaction commits.

B. Configuring Host-Based Authentication (`pg_hba.conf`)

The primary server must allow the replication user from the standby server(s) to connect for replication purposes.

Add an entry to pg_hba.conf on the primary:

# TYPE  DATABASE        USER            ADDRESS                 METHOD
host    replication     repl_user       pg_standby/32           scram-sha-256

Replace pg_standby/32 with the actual IP address or subnet of your standby server.

C. Creating the Replication Slot and User

Connect to PostgreSQL on the primary server to create the necessary user and the replication slot.

1. Create Replication User:

CREATE ROLE repl_user WITH REPLICATION LOGIN PASSWORD 'a_strong_password';

2. Create Replication Slot:

This slot ensures WAL segments are retained until the standby confirms receipt, preventing data loss if the standby temporarily disconnects.

SELECT pg_create_physical_replication_slot('standby_app_name');

Note on Naming: The name provided here (standby_app_name) must match the name specified in synchronous_standby_names on the primary.

D. Restarting the Primary

Apply all configuration changes by restarting the PostgreSQL service on the primary server.

sudo systemctl restart postgresql

Step 2: Configuring the Standby Server

The standby server is configured to stream WAL records from the primary using a recovery configuration.

A. Base Backup

Before starting streaming, the standby needs a full copy of the primary's data directory. Stop PostgreSQL on the standby first.

sudo systemctl stop postgresql

Take the base backup using pg_basebackup. Replace paths and connection details as necessary:

# Example using the pg_basebackup utility
pg_basebackup -h pg_primary -D /var/lib/postgresql/15/main/ -U repl_user -P -Xs -R -W

-D: The target data directory on the standby.
-U: The replication user.
-P: Show progress.
-Xs: Include necessary WAL files during the base backup.
-R: Automatically create the standby.signal file and generate the necessary connection settings in postgresql.auto.conf (or recovery configuration).

B. Configuring `postgresql.conf` on the Standby

On the standby, ensure postgresql.conf permits it to act as a replica. The key setting here is setting the application name, which must match the slot name used on the primary.

# --- Required on Standby ---
primary_conninfo = 'host=pg_primary port=5432 user=repl_user password=a_strong_password application_name=standby_app_name'
hot_standby = on          # Allows read queries during recovery/standby mode

C. Starting the Standby

Start the PostgreSQL service on the standby server.

sudo systemctl start postgresql

Step 3: Verification and Testing Synchronous Commit

Once both servers are running, verify the connection and then test the synchronous behavior.

A. Verifying Replication Status

Connect to the primary database and check the pg_stat_replication view:

SELECT client_addr, application_name, state, sync_state FROM pg_stat_replication;

You should see an entry for standby_app_name with sync_state as sync.

B. Testing Synchronous Commit

The global parameter that dictates how hard PostgreSQL waits is synchronous_commit. For RPO=0, you must use a value that forces synchronization.

1. Setting Global Behavior

If you configured synchronous_standby_names on the primary as shown in Step 1, the default behavior will enforce waiting for the required standbys if synchronous_commit is set to on (the default) or remote_write.

For the strongest guarantee, explicitly set it in postgresql.conf to remote_write or remote_apply (if you need the standby to have flushed data to disk, not just received it).

# In postgresql.conf on Primary
synchronous_commit = remote_write

Warning: Setting synchronous_commit = remote_write or on significantly increases transaction latency compared to asynchronous modes (off or local). This latency directly correlates with the network speed between the primary and the synchronous standby.

2. Testing Within a Transaction

To test transactionally (without requiring a global configuration change), you can set it per session or transaction:

-- Connect to Primary

BEGIN;
SET LOCAL synchronous_commit = remote_write;

INSERT INTO sales (item, amount) VALUES ('Widget A', 100);
-- This INSERT will block until 'standby_app_name' confirms receipt.

COMMIT;
-- The COMMIT succeeds only after the standby acknowledges the WAL write.

If the connection to the synchronous standby is lost during the transaction, the primary will either wait indefinitely (if the standby disconnects cleanly) or fall back based on the synchronous_commit_fallback_on_error setting (defaulting to on, which means the transaction might fail or hang if the primary cannot confirm sync status).

Best Practices for Synchronous HA

Use Dedicated Standbys: Only assign standbys that are physically close (low latency) to the primary to your synchronous replication list. High latency will severely impact write performance.
Monitor Replication Lag: Even in synchronous mode, monitor the standby lag. A slow standby that is still technically 'sync' but taking too long to process WAL can still impact user experience.
Connection Fallback: Understand the synchronous_commit_fallback_on_error setting. If set to off, a failure to communicate with the synchronous standby during a commit will cause the transaction on the primary to fail, preventing potential data divergence, but immediately impacting availability.
Use Multiple Standbys: For maximum redundancy within synchronous setup, configure synchronous_standby_names = '2 (standby1, standby2)' to require commits from two distinct standbys.