Best Practices for Preventing Data Loss: RDB vs. AOF Configuration

Protect your Redis data from loss by mastering RDB snapshots and AOF persistence. This comprehensive guide compares both methods, details their configurations in `redis.conf`, and outlines best practices. Learn how to combine RDB and AOF, choose the optimal `appendfsync` policy, manage AOF rewriting, and implement monitoring to ensure data durability and fast recovery after failures.

Best Practices for Preventing Data Loss: RDB vs. AOF Configuration

Redis persistence is easy to misunderstand because Redis feels like a database but behaves like memory first. If you restart it with no persistence, the data is gone. If persistence is enabled but tuned carelessly, you may still lose recent writes, stall clients during disk pressure, or discover during an outage that the only copy of your data lives on the same failed volume as the Redis process.

The right Redis data loss strategy starts with one honest question: what happens if the last few seconds, minutes, or hours of writes disappear? A cache of product pages can usually be rebuilt. A session store may annoy users but not destroy the business. A queue, rate-limit ledger, cart store, or feature flag store may need much tighter durability. RDB and AOF are the two tools Redis gives you, and they solve different parts of that problem.

Understanding Redis Persistence Mechanisms

Redis has two main persistence modes:

RDB snapshots

RDB writes a point-in-time snapshot of the dataset to disk, usually as dump.rdb. Redis forks a child process, the child serializes the dataset, and the parent keeps serving clients. That makes RDB useful for backups, replicas, and fast restarts, but it has a clear tradeoff: anything written after the last successful snapshot can be lost if the process or host fails.

Typical RDB settings look like this:

save 900 1       # Save after 15 minutes if at least 1 key changed
save 300 10      # Save after 5 minutes if at least 10 keys changed
save 60 10000    # Save after 1 minute if at least 10000 keys changed
dbfilename dump.rdb
dir /var/lib/redis

Those save lines are not a promise that Redis will save exactly on that schedule. They are trigger rules: if enough keys changed during the interval, Redis starts a background save. If the disk is slow, the dataset is huge, or the host is low on memory, the background save can fail or create latency through fork and copy-on-write pressure.

RDB is strongest when you want compact snapshots and quick restore behavior. It is weakest when your tolerance for recent data loss is low.

AOF logging

AOF, or append-only file persistence, records write commands so Redis can replay them on restart. It usually gives better durability than RDB because it can flush writes more often than a snapshot schedule. The tradeoff is more disk I/O, larger files before rewrite, and sometimes slower startup if Redis has to replay a large log.

The core settings are:

appendonly yes
appendfilename "appendonly.aof"
appendfsync everysec
aof-auto-rewrite-percentage 100
aof-auto-rewrite-min-size 64mb
aof-rewrite-incremental-fsync yes

The important line is appendfsync. With everysec, Redis asks the operating system to flush the AOF roughly once per second. In a normal Redis process crash, that usually limits loss to about the most recent second of writes. In a full host crash or storage failure, the exact loss depends on the OS, filesystem, disk cache, and storage behavior, so do not describe it as a mathematical guarantee.

appendfsync always flushes after every write and is much more expensive. It may be appropriate for a small, critical Redis deployment with modest write volume, but it can hurt latency badly under real traffic. appendfsync no lets the OS decide when to flush; it is fast, but it gives you a much wider and less predictable loss window.

Best Practices for Data Loss Prevention

Use both only when you understand which file Redis will load

Many production Redis deployments enable both RDB and AOF. That is a sensible default when Redis stores data that would be painful to rebuild. RDB gives you compact backup artifacts. AOF gives you a smaller recent-loss window.

Use configuration like this:

save 900 1
save 300 10
save 60 10000
appendonly yes
appendfsync everysec

One detail matters during restore: when AOF is enabled, Redis normally loads the AOF data on startup because it is expected to be more complete than the RDB snapshot. Do not assume Redis will load RDB first and fall back to AOF. Test the restore path for your Redis version and deployment layout, especially on Redis versions that use the newer multi-part AOF format.

Pick appendfsync from the loss window backward

Start with the business impact, not the Redis setting.

If Redis is a disposable cache, RDB alone may be enough, or even no persistence if your application repopulates safely. If Redis contains sessions, appendfsync everysec is often a practical balance. If Redis is part of a workflow where losing acknowledged writes creates real business damage, Redis persistence alone may not be the right durability boundary. You may need a primary database, a durable queue, or application-level write-ahead behavior outside Redis.

For most operational Redis use, start with:

appendonly yes
appendfsync everysec

Then watch latency, disk write time, AOF rewrite behavior, and restart time before deciding whether to move toward always or away from AOF.

Keep RDB snapshots, but do not make them too aggressive

Frequent RDB saves reduce the amount of data lost between snapshots, but they also increase fork frequency. Forking a large Redis process can be expensive, and writes during the child save create copy-on-write memory pressure. If your Redis instance has a 40 GB dataset and the write rate is high, saving every minute may create worse reliability because the host spends too much time under memory and disk pressure.

Reasonable RDB save rules depend on write rate and recovery expectations. A small cache can snapshot often without trouble. A large session store may need fewer RDB snapshots plus AOF for recent durability. Watch INFO persistence, Redis logs, and host memory during BGSAVE, not just the Redis configuration file.

Treat AOF rewrite as normal maintenance, not an emergency

AOF files grow because they record writes. Redis rewrites them into a compact representation in the background. The defaults are often a decent starting point:

aof-auto-rewrite-percentage 100
aof-auto-rewrite-min-size 64mb

That means Redis considers rewriting when the AOF has grown significantly compared with the size after the previous rewrite, once it is at least the minimum size. If rewrites happen constantly, increase the minimum size or investigate a noisy write pattern. If the AOF grows for a long time without rewriting, check logs, disk space, and INFO persistence.

You can trigger a rewrite manually:

redis-cli BGREWRITEAOF

Do that during a quiet period if possible. A rewrite is safer than letting the file grow forever, but it still consumes CPU, disk bandwidth, and copy-on-write memory.

Back up the persistence files somewhere else

Persistence is not a backup. Persistence files on the same host protect you from a Redis process restart. They do not protect you from a lost disk, accidental deletion, a bad deploy that overwrites the data directory, or an operator mistake that runs FLUSHALL.

Copy RDB and AOF files to separate storage. If you use filesystem snapshots or cloud volume snapshots, test restore on a separate instance. For RDB copies, prefer copying a completed snapshot file rather than a file being written. For AOF, understand the file layout for your Redis version before building backup scripts around file names.

Watch the signals that predict data loss

The useful command during an incident is:

redis-cli INFO persistence

Look for failed background saves, AOF rewrite status, last save time, and delayed fsync indicators. Pair that with host metrics: disk latency, free disk space, memory headroom, and kernel OOM events. If BGSAVE fails for hours and nobody notices, the Redis configuration may look safe while the actual recovery point gets older and older.

Use replication or Sentinel for availability, not as your only backup

Replicas, Redis Sentinel, and Redis Cluster help with availability. They do not automatically solve every data loss problem. A bad write, accidental deletion, or application bug can replicate quickly. Failover can also promote a replica that is missing recent writes if replication lag exists. Keep persistence, backups, and restore tests in the design.

The practical setup for many teams is AOF with appendfsync everysec, RDB snapshots at a reasonable cadence, external backups, monitoring on persistence failures, and a tested restore runbook. Redis can be reliable in that shape, but only if you treat persistence as an operational system instead of a checkbox.