Detecting and Eliminating Database Bloat in PostgreSQL Using VACUUM

Detect PostgreSQL bloat with catalog stats, tune autovacuum, and choose VACUUM, VACUUM FULL, REINDEX, or pg_repack safely.

Detecting and Eliminating Database Bloat in PostgreSQL Using VACUUM

PostgreSQL database bloat happens when old row versions and oversized indexes take more space than your live data needs. You may notice bigger backups, slower scans, rising storage bills, or autovacuum working constantly without the table getting smaller on disk.

The fix depends on what kind of space you need to reclaim. Standard VACUUM makes dead tuple space reusable inside PostgreSQL. VACUUM FULL, REINDEX, and tools like pg_repack can physically shrink objects, but they come with different locking and operational costs.

Understanding PostgreSQL MVCC and Bloat

To effectively combat bloat, we must first understand its root cause. PostgreSQL's MVCC architecture ensures that readers never block writers and vice versa. When a row is updated, PostgreSQL doesn't overwrite the old row; it inserts a new version and marks the old version as dead. Similarly, deleted rows leave behind dead tuples.

Bloat occurs when these dead tuples accumulate faster than the maintenance processes (Autovacuum or manual VACUUM) can clean them up or reuse the space.

Consequences of Database Bloat

Bloat impacts performance in several key areas:

  1. Increased Disk Space Usage: Dead tuples occupy physical space, forcing tables and indexes to consume more storage than necessary.
  2. Slower Sequential Scans: The database engine must read past dead tuples during table scans, increasing I/O load.
  3. Inefficient Indexing: Bloated indexes are larger, leading to more disk reads to traverse the index structure.
  4. Wasted Autovacuum Efforts: Autovacuum has to work harder and longer to clean up tables, potentially delaying critical maintenance on other tables.

Detecting Database Bloat

Detection starts with catalog statistics and object sizes. Treat simple queries as triage, not exact bloat measurement, because table layout, fillfactor, TOAST data, and indexes all affect the real number.

1. Identifying Bloated Tables using pg_stat_user_tables

The pg_stat_user_tables view provides statistics on user-defined tables. We can calculate the approximate bloat by comparing the total size allocated to the table versus the size of the live data.

Key Metrics to Monitor:

  • n_dead_tup: Number of dead tuples.
  • last_autovacuum, last_vacuum: When maintenance was last run.

Simple counts are useful because they show where vacuum pressure is building. A large n_dead_tup compared with n_live_tup is a good reason to inspect a table more closely.

Example Query (Finding Vacuum Candidates):

This query highlights tables with many dead tuples and shows when vacuum last ran:

SELECT
    relname,
    n_live_tup,
    n_dead_tup,
    round(100.0 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0), 2) AS dead_tuple_pct,
    pg_size_pretty(pg_total_relation_size(relid)) AS total_size,
    last_autovacuum,
    last_vacuum
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY
    n_dead_tup DESC
LIMIT 10;

2. Assessing Bloated Indexes

Bloat often affects indexes significantly. Start by looking for unusually large indexes on high-churn tables:

SELECT
    schemaname,
    relname AS table_name,
    indexrelname AS index_name,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
    idx_scan
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC
LIMIT 20;

Large size alone does not prove bloat, but it tells you which indexes are worth checking with deeper tools such as the pgstattuple extension or your monitoring platform's bloat query.

Managing Bloat: The Role of VACUUM

VACUUM is PostgreSQL's primary tool for reclaiming space from dead tuples and updating visibility maps.

Autovacuum: The First Line of Defense

By default, PostgreSQL runs autovacuum processes automatically. Autovacuum performs a standard VACUUM (which marks space as reusable internally but does not release it back to the OS) when a threshold is met. This threshold is defined by autovacuum_vacuum_scale_factor (default 0.2 or 20% of the table size) plus autovacuum_vacuum_threshold (default 50 tuples).

Configuration Tip: For high-churn tables, consider lowering the scale_factor to trigger maintenance sooner, preventing large bloat accumulation.

-- Example: Setting aggressive autovacuum parameters for a critical table 'orders'
ALTER TABLE orders SET (autovacuum_vacuum_scale_factor = 0.05, autovacuum_vacuum_threshold = 100);

Standard VACUUM vs. VACUUM FULL

There are two primary modes of cleanup:

Standard VACUUM

A standard VACUUM marks dead tuples for reuse within the existing physical file. It does not shrink the table file size on disk. This is non-blocking and safe for high-traffic tables.

VACUUM table_name;
VACUUM (VERBOSE) table_name; -- Shows statistics about cleaned tuples

VACUUM FULL (The Space Reclamation Tool)

VACUUM FULL rewrites the entire table file to physically remove dead tuples and reclaim space back to the operating system.

Warning: VACUUM FULL requires an ACCESS EXCLUSIVE lock on the table for its duration. This means all read and write operations on that table will be blocked until VACUUM FULL completes. Use this command judiciously on large, heavily used tables.

VACUUM FULL table_name;

Best Practice: Only use VACUUM FULL when the bloat is severe, and you can afford downtime, or during scheduled maintenance windows.

Advanced Anti-Bloat Strategies

When VACUUM FULL is too disruptive, alternative methods exist to reclaim space with less downtime.

1. Rebuilding Indexes

Individual indexes can be rebuilt when index bloat is the main issue. For busy systems, prefer the concurrent form so reads and writes can continue for most of the operation:

REINDEX INDEX CONCURRENTLY index_name;

Plain REINDEX INDEX index_name; is faster but takes stronger locks, so use it during a maintenance window.

2. Using pg_repack for Online Table Rewrites

The pg_repack utility is the preferred method for eliminating table bloat with minimal downtime. It works by creating a new, clean copy of the table structure and data alongside the old table, synchronously applying changes, and then atomically swapping the tables.

How pg_repack works:

  1. It creates a temporary table (_new) mirroring the original.
  2. It continuously monitors changes on the original table using triggers.
  3. It performs a final synchronized copy and swap.

Installation and Usage (Typical Flow):

Install the extension and CLI package for your PostgreSQL version, enable the extension in the database, then run the pg_repack command from a shell:

CREATE EXTENSION pg_repack;
pg_repack --table=public.critical_table --dbname=mydb

Note on pg_repack: While it significantly reduces locking compared to VACUUM FULL, it still requires creating triggers and copying data, which consumes extra I/O and storage temporarily.

Takeaway

Database bloat is manageable when you monitor it before it becomes an outage. Prevention through tuned autovacuum is better than emergency rewrites. When bloat occurs, follow this hierarchy:

  1. Monitor: Regularly check pg_stat_user_tables for high n_dead_tup counts.
  2. Tune Autovacuum: For active tables, lower the scale factor to ensure standard VACUUM runs more frequently.
  3. Repair: If bloat is minor, a standard VACUUM table_name might suffice if table activity drops off.
  4. Aggressive Repair (Low Downtime): Use pg_repack to rewrite the table structure online.
  5. Emergency Repair (High Downtime): Use VACUUM FULL only as a last resort when downtime is acceptable, as it holds exclusive locks.

Before any rewrite, check for long-running transactions and make sure you have enough free disk space for the operation.