Mastering PostgreSQL Replication: Types and Setup Explained

PostgreSQL replication keeps a second server close enough to the primary that you can survive hardware failure, shift read traffic, or run a controlled migration. If your database is a production dependency, you need to know which PostgreSQL replication model fits your risk tolerance before a node fails.

PostgreSQL gives you two common choices: streaming replication and logical replication. Streaming replication copies WAL at the physical cluster level. Logical replication sends row-level changes from selected tables through publications and subscriptions.

Why PostgreSQL Replication Matters

Replication helps with four everyday operations problems:

High availability: If the primary fails, you can promote a standby and point applications at it.
Disaster recovery: A standby in another location can protect you from a site-level outage.
Read scaling: Read-only queries can run against hot standbys instead of the write primary.
Migration support: Logical replication can help move selected tables between PostgreSQL versions or database layouts.

Replication is not a backup replacement. A bug, bad migration, or accidental DELETE can replicate quickly. Keep tested backups and point-in-time recovery alongside replication.

Streaming Replication (Physical Replication)

Streaming replication is the most common and fundamental form of replication in PostgreSQL. It works by sending Write-Ahead Log (WAL) records from the primary server to one or more replicas. These WAL records represent every change made to the database. The replicas then apply these WAL records to their own data files, ensuring they remain consistent with the primary.

Synchronous vs. Asynchronous Streaming

Synchronous replication makes the primary wait for one or more synchronous standbys before it reports a commit to the client. The exact safety level depends on synchronous_commit; for example, waiting for WAL to be written is different from waiting for it to be replayed. You get stronger protection against losing acknowledged commits, but every commit now depends on replica and network latency.

Asynchronous replication lets the primary commit locally and send WAL to replicas afterward. It is faster for writes, but a primary crash can lose recent transactions that had not reached a standby yet.

Setting Up Streaming Replication (Asynchronous Example)

Setting up streaming replication involves configuring both the primary and replica servers. Here’s a simplified guide:

1. Configure the Primary Server (`postgresql.conf` and `pg_hba.conf`)

On the primary server, you need to enable WAL archiving and replication connections.

postgresql.conf modifications:
```
wal_level = replica  # or logical for logical replication
max_wal_senders = 5  # Number of concurrent replication connections
wal_keep_size = 512MB # Or wal_keep_segments for older versions
# For synchronous replication, add:
# synchronous_standby_names = 'replica1,replica2'
# Or for specific server name/priority:
# synchronous_standby_names = '1 (replica1), 2 (replica2)'
archive_mode = on
archive_command = 'test ! -f /path/to/wal_archive/%f && cp %p /path/to/wal_archive/%f'
```
- wal_level: Must be at least replica for streaming replication.
- max_wal_senders: Specifies how many standby servers can connect simultaneously.
- wal_keep_size: Prevents WAL files from being deleted before replicas can fetch them (a simpler alternative to archive_command for basic setups, but archiving is recommended for robustness).
- archive_mode and archive_command: Useful for point-in-time recovery (PITR) and for replicas that need old WAL after falling behind. In production, use a real archive target or a backup tool instead of a local copy command.

pg_hba.conf modifications:

Allow the replica to connect for replication. Replace replica_ip_address with the actual IP of your replica.

# TYPE  DATABASE        USER            ADDRESS                 METHOD
host    replication     replication_user  replica_ip_address/32   md5

You'll also need to create a replication user:

-- On the primary server:
CREATE ROLE replication_user WITH REPLICATION LOGIN PASSWORD 'your_password';

After modifying these files, reload the PostgreSQL configuration:

pg_ctl reload
# Or restart PostgreSQL if needed

2. Prepare the Replica Server

Before starting the replica, it must have a data directory that is a copy of the primary's data directory at a specific point in time. The easiest way is to use pg_basebackup.

Stop PostgreSQL on the replica (if running).
Take a base backup:
```
# Ensure PGDATA is empty or removed first
pg_basebackup -h primary_host_ip -p 5432 -U replication_user -D /var/lib/postgresql/data/ -Fp -Xs -P -R
```
- -h, -p, -U: Specify the primary server's connection details.
- -D: The data directory for the replica.
- -Fp: Format is plain.
- -Xs: Stream WAL during the backup.
- -P: Show progress.
- -R: Write standby connection settings and create standby.signal for PostgreSQL 12 and newer.
- You will be prompted for the replication_user's password.

3. Configure the Replica Server

postgresql.conf modifications (for PG12+):

hot_standby = on # Allows read-only queries on the replica
primary_conninfo = 'host=primary_host_ip port=5432 user=replication_user password=your_password'

hot_standby: Enables read-only queries on the standby.
primary_conninfo: Connection string to the primary server.

Older PostgreSQL versions:

PostgreSQL 12 removed recovery.conf. If you maintain an older server, create recovery.conf in the replica data directory:
```
standby_mode = 'on'
primary_conninfo = 'host=primary_host_ip port=5432 user=replication_user password=your_password'
# If using archive recovery instead of streaming, you'd specify restore_command
# restore_command = 'cp /path/to/wal_archive/%f %p'
# recovery_target_timeline = 'latest'
```
On PostgreSQL 12 and newer, standby mode is controlled by standby.signal, and primary_conninfo usually lives in postgresql.auto.conf when created by pg_basebackup -R.

4. Start the Replica Server

Start the PostgreSQL service on the replica. It will connect to the primary, receive WAL records, and begin to synchronize. You can check the logs for confirmation.

Tip: For robust HA, consider using tools like Patroni or repmgr, which automate failover and management.

Logical Replication

Logical replication is a more flexible and granular form of replication introduced in PostgreSQL 10. Instead of replicating entire blocks of data or WAL records, it replicates data changes based on their logical meaning (e.g., INSERT, UPDATE, DELETE statements) at the row level. This is achieved by decoding the WAL records into a stream of logical changes.

Key Features and Use Cases:

Selective replication: You can choose which tables to replicate. Recent PostgreSQL versions also support column lists in publications, but check your server version before relying on that feature.
Cross-Version Replication: Logical replication can replicate data between different major versions of PostgreSQL.
Schema control: Logical replication does not automatically replicate DDL. Create matching tables and apply schema migrations on the subscriber.
Data Transformation: While not built-in, logical replication provides a foundation for more complex ETL (Extract, Transform, Load) processes.
Replication from a Primary to a Replica that is not a full clone: The target database does not need to be a complete physical copy of the source.

How it Works:

Publisher: The source database (primary) where data changes occur. It needs wal_level = logical. Changes are decoded from WAL into a logical stream.
Publication: A named set of tables on the publisher whose changes will be replicated.
Subscriber: The target database (replica) that receives the changes.
Subscription: A connection on the subscriber that connects to the publisher and applies the changes from a specific publication.

Setting Up Logical Replication

1. Configure the Publisher (Primary Server)

postgresql.conf modifications:

wal_level = logical
max_replication_slots = 10 # For logical replication slots
max_wal_senders = 10     # Should be at least max_replication_slots

Create a Publication:

-- On the publisher database:
CREATE PUBLICATION my_publication FOR TABLE 
    table1, 
    table2 
    WITH (publish = 'insert,update,delete');

-- Or for all tables:
-- CREATE PUBLICATION all_tables_pub FOR ALL TABLES;

Reload the configuration on the publisher.

2. Configure the Subscriber (Replica Server)

Ensure the target tables exist: The subscriber database must have the target tables with the same schema as the publisher. You can create them manually or use pg_dump to extract schema.
Create a Subscription:
```
-- On the subscriber database:
CREATE SUBSCRIPTION my_subscription
    CONNECTION 'host=publisher_host_ip port=5432 user=replication_user password=your_password dbname=publisher_db'
    PUBLICATION my_publication;
```
The replication_user needs appropriate permissions on the publisher.

PostgreSQL will automatically create a replication slot on the publisher and start applying changes. You can monitor the subscription status using pg_stat_subscription on the subscriber.

Tip: Logical replication uses PostgreSQL's built-in logical decoding infrastructure. It does not require a separate extension for basic publications and subscriptions.

Choosing the Right Replication Method

Streaming Replication: Ideal for high availability and disaster recovery, where you need an exact, byte-for-byte copy of the primary. It's simpler to set up for full database replication and provides the best read-scalability for read-only replicas.
Logical Replication: Best suited for selective data distribution, migrations, cross-version upgrades, or when you need to replicate only a subset of data. It allows for more complex scenarios like replicating to different schemas or performing data transformations.

Takeaway

Use streaming replication when you need a full standby for failover, disaster recovery, or read-only traffic. Use logical replication when you need selected tables, cross-version migration, or controlled data movement between different databases.

Before you trust either setup, run a failover drill, check application connection handling, monitor replication lag, and verify that backups still restore cleanly. Replication keeps another server current; it does not replace operational testing.