Replication and High Availability

85 min

PostgreSQL replication enables you to maintain multiple copies of your database for high availability, disaster recovery, and read scaling. Replication ensures that if the primary database fails, a standby can take over with minimal data loss. Replication also enables distributing read queries across multiple servers, improving performance for read-heavy workloads. Understanding replication is essential for building production-grade PostgreSQL deployments.

Streaming replication (physical replication) continuously copies WAL (Write-Ahead Log) records from the primary server to standby servers in real-time. Standby servers apply these WAL records to maintain an identical copy of the primary database. Streaming replication provides byte-for-byte identical copies, making it ideal for high availability and disaster recovery. Standby servers can be used for read queries (hot standby) or kept offline for disaster recovery (warm standby).

Synchronous replication ensures that transactions are committed on both primary and standby before returning success to the client. This provides zero data loss but can impact performance due to network latency. Asynchronous replication (default) commits on the primary immediately and replicates to standbys asynchronously, providing better performance but potential for small data loss if the primary fails. Choosing between synchronous and asynchronous depends on your availability and performance requirements.

Logical replication (introduced in PostgreSQL 10) replicates data changes at the logical level rather than physical WAL records. Logical replication enables replicating specific tables or databases, cross-version replication, and replicating to different database systems. Logical replication is more flexible than streaming replication but has more overhead and doesn't replicate everything (DDL, sequences, etc.). It's ideal for use cases requiring selective replication or cross-version upgrades.

High availability setups typically use streaming replication with automatic failover. Tools like Patroni, repmgr, or pg_auto_failover automate failover processes, detecting primary failures and promoting standbys automatically. These tools handle the complexity of maintaining replication, monitoring health, and coordinating failovers. Understanding high availability architectures enables you to design systems that survive hardware failures with minimal downtime.

Read scaling uses replication to distribute read queries across multiple servers. You can configure application load balancers to route read queries to standby servers while writes go to the primary. This improves performance for read-heavy workloads. However, replication lag means standbys may have slightly stale data, which applications must account for. Understanding replication lag and consistency requirements helps you design appropriate read scaling strategies.

Key Concepts

Replication maintains multiple database copies for availability and scaling.
Streaming replication copies WAL records for byte-for-byte identical copies.
Synchronous replication ensures zero data loss but impacts performance.
Logical replication enables selective table/database replication.
High availability requires automatic failover mechanisms.

Learning Objectives

Master

Setting up streaming replication between PostgreSQL servers
Configuring synchronous and asynchronous replication
Understanding logical replication and its use cases
Implementing high availability with automatic failover

Develop

Understanding database high availability architectures
Designing disaster recovery strategies
Implementing read scaling with replication

Tips

Use streaming replication for high availability and disaster recovery.
Configure wal_level = replica and max_wal_senders for replication.
Use synchronous replication only when zero data loss is critical.
Monitor replication lag: SELECT * FROM pg_stat_replication;

Common Pitfalls

Not monitoring replication lag, causing stale reads or replication failures.
Using synchronous replication without understanding performance impact.
Not testing failover procedures, unable to recover during actual failures.
Not configuring proper backup strategies alongside replication.

Summary

Replication enables high availability and read scaling.
Streaming replication provides real-time data copying.
Logical replication enables flexible, selective replication.
High availability requires automatic failover mechanisms.

Exercise

Set up basic streaming replication between primary and standby servers.

-- On primary server (postgresql.conf)
wal_level = replica
max_wal_senders = 3
max_replication_slots = 3

-- On primary server (pg_hba.conf)
host    replication     replicator     192.168.1.0/24        md5

-- Create replication user on primary
CREATE USER replicator REPLICATION LOGIN PASSWORD 'repl_password';

-- On standby server, create recovery.conf
primary_conninfo = 'host=primary_server port=5432 user=replicator password=repl_password'
restore_command = 'cp /var/lib/postgresql/archive/%f %p'
archive_cleanup_command = 'pg_archivecleanup /var/lib/postgresql/archive %r'

-- Start standby server
pg_ctl start -D /var/lib/postgresql/data

-- Check replication status on primary
SELECT * FROM pg_stat_replication;

-- Check replication lag
SELECT 
    client_addr,
    state,
    sent_lsn,
    write_lsn,
    flush_lsn,
    replay_lsn
FROM pg_stat_replication;

Exercise Tips

Use pg_basebackup for initial standby setup from primary.
Monitor replication status regularly: SELECT * FROM pg_stat_replication;
Test failover procedures regularly to ensure they work when needed.
Use replication slots to prevent WAL deletion before replication.

Replication and High Availability