How to Backup and Restore Databases Using pg_dump and pg_restore
Master PostgreSQL data protection with this comprehensive guide to `pg_dump` and `pg_restore`. Learn how to create reliable database backups using various formats, including plain-text, custom, and directory. Discover essential options for efficient dumping and explore step-by-step instructions for restoring your data. This tutorial covers best practices for scheduling, testing, and securely managing your PostgreSQL backups to ensure data integrity and recovery capabilities.
How to Backup and Restore Databases Using pg_dump and pg_restore
You need a PostgreSQL backup you can restore, not just a file that looks like a backup. pg_dump and pg_restore are the standard tools for logical backups: they copy database objects and data into a portable dump, then rebuild them later on the same server, a test server, or a new PostgreSQL instance.
This guide focuses on single-database logical backups. For whole-cluster disaster recovery, point-in-time recovery, or very large installations, pair this with physical backups and WAL archiving.
Understanding pg_dump
pg_dump connects to one database and exports its schema and data. It does not dump roles, tablespaces, or global cluster objects; use pg_dumpall --globals-only when you need those.
Output Formats
pg_dump supports multiple output formats:
- Plain text (
-Fp, the default): Writes SQL that you restore withpsql. It is easy to inspect and edit, but large restores can be slower and less flexible. - Custom archive (
-Fc): Writes a compressed archive forpg_restore. It supports selective restore and parallel restore. - Directory format (
-Fd): Writes a directory of files. It supports parallel dump and parallel restore, which helps for large databases. - Tar format (
-Ft): Writes a tar archive forpg_restore. It is a single file, but it does not support compression inside the format the same way custom archives do.
Common pg_dump Options
Here are some of the most frequently used options for pg_dump:
-h <hostname>: Connect to a host.-p <port>: Connect to a port.-U <username>: Connect as a user.-W: Force a password prompt before connecting.-f <filename>: Write to a file or directory instead of standard output.-F <format>: Choosep,c,d, ort.-Z <level>: Set compression level for formats that support compression.--exclude-table=<pattern>: Exclude matching tables.-j <number>or--jobs=<number>: Use parallel jobs when dumping in directory format.--clean: Include drop commands before create commands in plain SQL output.--create: Include commands to create the database itself.
Example: Basic Plain-Text Backup
To create a plain-text backup of a database named mydatabase owned by user myuser and save it to mydatabase_backup.sql:
pg_dump -U myuser -f mydatabase_backup.sql mydatabase
If you need to specify a host and port:
pg_dump -h localhost -p 5432 -U myuser -f mydatabase_backup.sql mydatabase
Depending on your authentication method, this may prompt for myuser's password.
Example: Custom Format Backup (Recommended)
Custom format is a good default for many production backups because it works with pg_restore and supports selective restore.
pg_dump -U myuser -Fc -f mydatabase_backup.dump mydatabase
This creates mydatabase_backup.dump in a format suitable for pg_restore.
Example: Directory Format Backup with Parallel Jobs
For large databases, parallel dumping can reduce backup time. It requires directory format:
pg_dump -U myuser -Fd -j 4 -f mydatabase_backup_dir mydatabase
This command will create a directory named mydatabase_backup_dir containing multiple files, utilizing 4 parallel jobs.
Understanding pg_restore
pg_restore restores custom, directory, or tar archives created by pg_dump. Plain SQL files are restored with psql, not pg_restore.
Common pg_restore Options
-h <hostname>: Connect to a host.-p <port>: Connect to a port.-U <username>: Connect as a user.-W: Force a password prompt before connecting.-d <database>: Connect to this database for the restore.-v: Print verbose restore output.-cor--clean: Drop database objects before recreating them.--create: Create the dumped database before restoring it. Connect with-dto an existing maintenance database, commonlypostgres.-j <number>: Use parallel jobs for restore where supported.-F <format>: Specify archive format if auto-detection is not enough.
Restoring from Plain-Text SQL Backups
If you created a plain-text SQL backup using pg_dump (e.g., mydatabase_backup.sql), you restore it using psql:
psql -U myuser -d mydatabase -f mydatabase_backup.sql
If the backup file includes CREATE DATABASE and \connect commands from pg_dump --create, connect to a maintenance database such as postgres:
psql -U myuser -f mydatabase_backup.sql postgres
Example: Restoring from a Custom Format Backup
To restore the custom format backup mydatabase_backup.dump into a database named restored_db:
pg_restore -U myuser -d restored_db mydatabase_backup.dump
If the archive was created with pg_dump --create, you can ask pg_restore to create the dumped database. The -d value must be an existing database used for the initial connection, not the new database being created:
pg_restore -U myuser --create -d postgres mydatabase_backup.dump
This creates the database named inside the dump and restores into it. If you want to restore into a differently named database, create that database first and restore without --create:
createdb -U myuser restored_db
pg_restore -U myuser -d restored_db mydatabase_backup.dump
Example: Parallel Restore
For faster restoration, especially with large datasets, use the -j option:
pg_restore -U myuser -d restored_db -j 4 mydatabase_backup.dump
This restores with 4 worker jobs. Parallel restore is most useful when the dump has multiple independent objects to load and index.
Example: Restoring from Directory Format Backup
To restore a backup created in directory format (e.g., mydatabase_backup_dir):
pg_restore -U myuser -d restored_db -j 4 mydatabase_backup_dir
Note that the last argument is the directory path.
Best Practices for Backups
- Schedule backups: Use
cron, systemd timers, or your orchestration platform. Pick a schedule based on how much data you can afford to lose. - Use the right format: Custom format is a strong default. Directory format is better when you need parallel dump for a large database.
- Test restores: Restore to a staging database on a schedule. A backup you never test is only a guess.
- Capture globals: Dump roles and tablespaces separately with
pg_dumpall --globals-onlywhen rebuilding a server from scratch. - Store copies offsite: Keep backups away from the primary server and protect them with encryption and access controls.
- Monitor jobs: Alert on failed dumps, unusually small backup files, and old backup timestamps.
- Plan retention: Decide how many hourly, daily, weekly, and monthly backups you keep, then automate cleanup.
- Be careful with
--clean: It drops objects in the target database. Use it only when you mean to replace existing objects.
Takeaway
Use pg_dump -Fc for a flexible single-file backup, pg_dump -Fd -j when you need parallel dumping, psql for plain SQL restores, and pg_restore for archive restores. Then prove the process by restoring to a test database before you need it in an emergency.