Step-by-Step Guide to Setting Up PostgreSQL Streaming Replication
Streaming replication is the foundational mechanism for achieving high availability (HA) and read scalability in PostgreSQL environments. By configuring a primary (master) server to continuously stream Write-Ahead Log (WAL) records to one or more standby (replica) servers, you ensure data synchronization with minimal lag.
This comprehensive guide walks through the essential steps required to establish robust, asynchronous streaming replication. We cover the necessary configuration changes on both the primary and standby servers, utilizing modern PostgreSQL features like pg_basebackup and signal files for simplified setup. Following this tutorial will equip you with a reliable foundation for robust PostgreSQL operation.
Prerequisites and Environment Setup
Before beginning, ensure the following prerequisites are met. This guide assumes two servers, Primary and Standby, running the same major version of PostgreSQL (version 12 or newer is recommended).
| Server | Role | IP Address (Example) |
|---|---|---|
| Primary | Source of truth | 192.168.1.10 |
| Standby | Replica | 192.168.1.11 |
- User: You must have administrative access (e.g.,
sudoor thepostgressystem user) on both servers. - Network: The Standby server must be able to connect to the Primary server on the PostgreSQL port (default 5432).
Step 1: Configure the Primary Server
The primary server must be configured to generate and serve WAL files for replication.
1.1 Modify postgresql.conf
Edit the main configuration file, typically located in the data directory (e.g., /etc/postgresql/14/main/postgresql.conf), and set the following parameters:
# Allow connections from other hosts
listen_addresses = '*'
# Set WAL level to 'replica' or higher
wal_level = replica
# Maximum number of concurrent connections from standby servers
max_wal_senders = 5
# Controls the number of standby connections that can be active simultaneously
max_replication_slots = 5
# Required for read-only queries on the standby (Hot Standby)
hot_standby = on
1.2 Create a Dedicated Replication User
For security, create a specific user with the REPLICATION attribute. This user will be used only by the standby server to pull WAL records.
# Connect to PostgreSQL
psql -c "CREATE USER replica_user REPLICATION ENCRYPTED PASSWORD 'SuperSecurePassword123';"
1.3 Update Client Authentication (pg_hba.conf)
Allow the replication user from the standby server's IP address to connect to the special replication pseudo-database.
# TYPE DATABASE USER ADDRESS METHOD
host replication replica_user 192.168.1.11/32 md5
1.4 Restart the Primary Server
Apply the changes by restarting the PostgreSQL service:
sudo systemctl restart postgresql
Step 2: Prepare the Standby Server
Before cloning the data, ensure the standby's PostgreSQL service is stopped and its existing data directory is cleared.
2.1 Stop Standby PostgreSQL Service
sudo systemctl stop postgresql
2.2 Clear the Data Directory
Warning: This step permanently deletes all data currently in the standby's data directory. Confirm the path before execution.
# Example path for PG 14
PG_DATA=/var/lib/postgresql/14/main
sudo rm -rf $PG_DATA/*
2.3 Clone Data Using pg_basebackup
Use pg_basebackup to create an exact copy of the primary's data directory. The -R flag is crucial as it automatically generates the necessary configuration files (standby.signal and primary_conninfo) needed for streaming replication (PostgreSQL 12+).
Run this command on the Standby server:
PG_DATA=/var/lib/postgresql/14/main
sudo -u postgres pg_basebackup -h 192.168.1.10 -D $PG_DATA -U replica_user -P -v -R
| Option | Description |
|---|---|
-h |
Primary server hostname/IP address. |
-D |
Local data directory path. |
-U |
Replication username (replica_user). |
-P |
Show progress. |
-v |
Verbose output. |
-R |
Create a replication configuration file automatically. |
Step 3: Configure and Start the Standby
3.1 Verify Standby Configuration
If you used the -R flag in Step 2.3, pg_basebackup created a standby.signal file and populated the primary_conninfo setting, usually in a generated configuration file named postgresql.auto.conf within the data directory.
Verify the contents of the primary_conninfo string. It should look similar to this (check inside $PG_DATA/postgresql.auto.conf):
primary_conninfo = 'host=192.168.1.10 user=replica_user password=SuperSecurePassword123 application_name=standby_node'
Tip: Ensure the password is included in
primary_conninfoor that you are using certificate-based authentication. If usingpg_hba.confwithtrustorcert, the password may be omitted.
3.2 Start the Standby Service
Since the required signal file (standby.signal) is present in the data directory, the service will start in read-only standby mode and immediately attempt to connect to the primary.
sudo systemctl start postgresql
Step 4: Verify Streaming Replication
After starting the standby, you must confirm that the connection is active and data synchronization is occurring.
4.1 Verification on the Primary Server
Connect to the primary server and query the pg_stat_replication view. You should see a row representing the connection from the standby server.
psql -c "SELECT client_addr, state, sync_state, sent_lsn, write_lsn, flush_lsn FROM pg_stat_replication;"
Expected Output (Key Fields):
client_addr: Should match the IP of the standby server (e.g.,192.168.1.11).state: Should bestreaming. If it showsstartuporcatching up, wait a moment. If it showswalsenderstarting up, you're close.sync_state: Should beasync(for standard asynchronous replication).
4.2 Testing Data Synchronization
To confirm data flow, execute a change on the primary and immediately check for its existence on the standby.
On Primary:
CREATE TABLE replication_test (id serial primary key, message text);
INSERT INTO replication_test (message) VALUES ('Data synchronized successfully');
On Standby (Read-Only):
-- This must succeed without error
psql -c "SELECT * FROM replication_test;"
If the data is visible on the standby, streaming replication is successfully configured and active.
Best Practices and Troubleshooting
Persistent Connection: Replication Slots
While optional, replication slots are highly recommended. A replication slot ensures the primary server does not prematurely discard WAL segments needed by the standby, even if the standby temporarily disconnects.
On Primary:
SELECT * FROM pg_create_physical_replication_slot('standby_slot_name');
Then, update the primary_conninfo on the standby to utilize this slot:
primary_conninfo = 'host=192.168.1.10 user=replica_user ... application_name=standby_node **slotname=standby_slot_name**'
Warning: Replication slots require careful monitoring. If a standby fails for an extended period, the accumulated WAL files protected by the slot can cause the primary server's disk space to fill rapidly.
Troubleshooting Common Issues
| Issue | Potential Cause | Solution |
|---|---|---|
Standby stuck in starting |
Network firewall or incorrect pg_hba.conf on primary. |
Verify port 5432 is open; confirm pg_hba.conf entry matches the standby IP and user. |
pg_basebackup fails with authentication error |
Incorrect password or missing host entry in pg_hba.conf. |
Double-check password for replica_user; ensure the primary database is restarted after modifying pg_hba.conf. |
| Standby is read-only | This is the expected behavior. | The presence of standby.signal forces the server into recovery mode. |
Conclusion
Setting up streaming replication is a critical step in building a resilient PostgreSQL architecture. By following these steps, you have successfully configured a primary-standby pair that ensures continuous data synchronization, significantly enhancing your system's high availability capabilities. The next logical step is to integrate a monitoring solution and a failover mechanism (like Patroni or Repmgr) to fully automate the HA setup.