Step-by-Step Guide to Setting Up PostgreSQL Streaming Replication

Configure PostgreSQL streaming replication with pg_basebackup, pg_hba.conf, standby.signal, and verification queries.

Step-by-Step Guide to Setting Up PostgreSQL Streaming Replication

Streaming replication is the foundational mechanism for achieving high availability (HA) and read scalability in PostgreSQL environments. By configuring a primary (master) server to continuously stream Write-Ahead Log (WAL) records to one or more standby (replica) servers, you ensure data synchronization with minimal lag.

This guide walks through asynchronous streaming replication using pg_basebackup, pg_hba.conf, and standby signal files. You will end with a working primary-standby pair and the checks needed to prove it is actually streaming.

Prerequisites and Environment Setup

Before beginning, ensure the following prerequisites are met. This guide assumes two servers, Primary and Standby, running the same major version of PostgreSQL (version 12 or newer is recommended).

Server Role IP Address (Example)
Primary Source of truth 192.168.1.10
Standby Replica 192.168.1.11
  • User: You must have administrative access (e.g., sudo or the postgres system user) on both servers.
  • Network: The Standby server must be able to connect to the Primary server on the PostgreSQL port (default 5432).

Step 1: Configure the Primary Server

The primary server must be configured to generate and serve WAL files for replication.

1.1 Modify postgresql.conf

Edit the main configuration file. On Debian and Ubuntu packages this is often under /etc/postgresql/<version>/main/postgresql.conf; on many source or container installs it lives in the data directory. Set these parameters:

# Allow connections from other hosts
listen_addresses = '*'

# Set WAL level to 'replica' or higher
wal_level = replica

# Maximum number of concurrent connections from standby servers
max_wal_senders = 5 

# Controls the number of standby connections that can be active simultaneously
max_replication_slots = 5

# Allows read-only queries on the standby
hot_standby = on

1.2 Create a Dedicated Replication User

For security, create a specific user with the REPLICATION attribute. This user will be used only by the standby server to pull WAL records.

# Connect to PostgreSQL
sudo -u postgres psql -c "CREATE ROLE replica_user WITH REPLICATION LOGIN PASSWORD 'use-a-real-secret-here';"

1.3 Update Client Authentication (pg_hba.conf)

Allow the replication user from the standby server's IP address to connect to the special replication pseudo-database.

# TYPE  DATABASE        USER            ADDRESS                 METHOD
host    replication     replica_user    192.168.1.11/32         md5

1.4 Restart the Primary Server

Apply the configuration changes. A restart is the simple option after changing listen_addresses; if you only changed pg_hba.conf, a reload is enough.

sudo systemctl restart postgresql

Step 2: Prepare the Standby Server

Before cloning the data, ensure the standby's PostgreSQL service is stopped and its existing data directory is cleared.

2.1 Stop Standby PostgreSQL Service

sudo systemctl stop postgresql

2.2 Clear the Data Directory

Warning: This step permanently deletes all data currently in the standby's data directory. Confirm the path before execution.

# Example path for PG 14
PG_DATA=/var/lib/postgresql/14/main

sudo rm -rf $PG_DATA/*

2.3 Clone Data Using pg_basebackup

Use pg_basebackup to create an exact copy of the primary's data directory. The -R flag is crucial as it automatically generates the necessary configuration files (standby.signal and primary_conninfo) needed for streaming replication (PostgreSQL 12+).

Run this command on the Standby server:

PG_DATA=/var/lib/postgresql/14/main

sudo -u postgres pg_basebackup -h 192.168.1.10 -D $PG_DATA -U replica_user -P -v -R
Option Description
-h Primary server hostname/IP address.
-D Local data directory path.
-U Replication username (replica_user).
-P Show progress.
-v Verbose output.
-R Create a replication configuration file automatically.

Step 3: Configure and Start the Standby

3.1 Verify Standby Configuration

If you used the -R flag in Step 2.3, pg_basebackup created a standby.signal file and populated the primary_conninfo setting, usually in a generated configuration file named postgresql.auto.conf within the data directory.

Verify the contents of the primary_conninfo string. It should look similar to this (check inside $PG_DATA/postgresql.auto.conf):

primary_conninfo = 'host=192.168.1.10 user=replica_user password=SuperSecurePassword123 application_name=standby_node'

Tip: Ensure the password is included in primary_conninfo or that you are using certificate-based authentication. If using pg_hba.conf with trust or cert, the password may be omitted.

3.2 Start the Standby Service

Since the required signal file (standby.signal) is present in the data directory, the service will start in read-only standby mode and immediately attempt to connect to the primary.

sudo systemctl start postgresql

Step 4: Verify Streaming Replication

After starting the standby, you must confirm that the connection is active and data synchronization is occurring.

4.1 Verification on the Primary Server

Connect to the primary server and query the pg_stat_replication view. You should see a row representing the connection from the standby server.

psql -c "SELECT client_addr, state, sync_state, sent_lsn, write_lsn, flush_lsn FROM pg_stat_replication;"

Expected Output (Key Fields):

  • client_addr: Should match the IP of the standby server (e.g., 192.168.1.11).
  • state: Should be streaming. If it shows startup or catching up, wait a moment. If it shows walsender starting up, you're close.
  • sync_state: Should be async (for standard asynchronous replication).

4.2 Testing Data Synchronization

To confirm data flow, execute a change on the primary and immediately check for its existence on the standby.

On Primary:

CREATE TABLE replication_test (id serial primary key, message text);
INSERT INTO replication_test (message) VALUES ('Data synchronized successfully');

On Standby (Read-Only):

-- This must succeed without error
psql -c "SELECT * FROM replication_test;"

If the data is visible on the standby, streaming replication is successfully configured and active.

Best Practices and Troubleshooting

Persistent Connection: Replication Slots

While optional, replication slots are highly recommended. A replication slot ensures the primary server does not prematurely discard WAL segments needed by the standby, even if the standby temporarily disconnects.

On Primary:

SELECT * FROM pg_create_physical_replication_slot('standby_slot_name');

Then set primary_slot_name on the standby. Do not put the slot name inside primary_conninfo.

primary_conninfo = 'host=192.168.1.10 user=replica_user password=use-a-real-secret-here application_name=standby_node'
primary_slot_name = 'standby_slot_name'

Warning: Replication slots require careful monitoring. If a standby fails for an extended period, the accumulated WAL files protected by the slot can cause the primary server's disk space to fill rapidly.

Troubleshooting Common Issues

Issue Potential Cause Solution
Standby cannot connect Network firewall, wrong listen_addresses, or incorrect pg_hba.conf on primary. Verify port 5432 is reachable; confirm pg_hba.conf matches the standby IP and user.
pg_basebackup fails with authentication error Incorrect password or missing host entry in pg_hba.conf. Double-check password for replica_user; ensure the primary database is restarted after modifying pg_hba.conf.
Standby is read-only This is the expected behavior. The presence of standby.signal forces the server into recovery mode.

Next Step

Setting up streaming replication is a critical step in building a resilient PostgreSQL architecture. By following these steps, you have successfully configured a primary-standby pair that ensures continuous data synchronization, significantly enhancing your system's high availability capabilities. The next logical step is to integrate a monitoring solution and a failover mechanism (like Patroni or Repmgr) to fully automate the HA setup.