Kafka Data Retention: Understanding and Managing Your Event Streams
Manage Kafka data retention with retention.ms, retention.bytes, topic overrides, compaction basics, and disk monitoring tips.
Kafka Data Retention: Understanding and Managing Your Event Streams
Kafka data retention answers a practical question: how long should your event streams stay on disk before Kafka can delete them? If your settings are too loose, brokers can run out of space. If they are too aggressive, a slow consumer may lose the chance to replay data.
Kafka stores records in partition logs. Those logs are split into segment files, and retention cleanup deletes old closed segments. That detail matters because Kafka does not usually delete one record at a time the moment it becomes old. A segment becomes eligible only when it meets the configured retention rules.
Why Retention Settings Matter
Retention is a tradeoff between storage, replay needs, and operational risk.
- Storage cost: Long retention on high-volume topics can consume a lot of broker disk.
- Consumer recovery: Your retention window must be longer than the longest realistic consumer outage or reprocessing window.
- Stability: Full disks can stop brokers from accepting writes and can trigger wider cluster trouble.
- Compliance: Some data must be kept for a minimum period, while other data should be removed quickly.
A payments topic might need several days of replay history. A debug-log topic in a development cluster might only need a few hours.
How Kafka Deletes Old Data
Kafka topics are divided into partitions. Each partition is an ordered append-only log. Kafka writes new records to the active segment and rolls to a new segment when the current one reaches a configured size or age.
Retention applies per partition. If you set retention.bytes=1073741824, that is roughly 1 GiB per partition, not 1 GiB for the whole topic. A topic with 12 partitions can therefore keep about 12 GiB before replicas are counted.
When both time-based and size-based retention are configured, Kafka can delete eligible old segments when either limit requires cleanup.
Time-Based Retention
Time-based retention keeps records for a configured period.
At the broker level, log.retention.ms sets the default for topics that do not override it. At the topic level, retention.ms overrides that default for one topic.
kafka-configs.sh --bootstrap-server localhost:9092 \
--entity-type topics \
--entity-name orders \
--alter \
--add-config retention.ms=259200000
That example sets orders to three days. Verify it with:
kafka-configs.sh --bootstrap-server localhost:9092 \
--entity-type topics \
--entity-name orders \
--describe
Use time retention when your main requirement is a replay window, such as "consumers must be able to recover from a weekend outage."
Size-Based Retention
Size-based retention caps how much log data each partition can keep.
At the broker level, log.retention.bytes sets the default per-partition limit. At the topic level, retention.bytes overrides it for one topic. A value of -1 means no size limit from that setting.
kafka-configs.sh --bootstrap-server localhost:9092 \
--entity-type topics \
--entity-name high-volume-logs \
--alter \
--add-config retention.bytes=1073741824
That sets a 1 GiB limit per partition. Use size retention when disk protection matters more than a fixed time window. Be careful with bursty topics, because a traffic spike can shorten the effective replay window.
Broker Defaults and Topic Overrides
Broker defaults live in server.properties and apply to topics that do not set their own retention values.
log.retention.ms=604800000
log.retention.bytes=-1
log.retention.check.interval.ms=300000
Changing broker defaults usually requires a broker restart or a dynamic broker configuration change, depending on your Kafka version and deployment tooling. Topic-level changes through kafka-configs.sh are often safer because different topics rarely need the same retention window.
For a new topic, set retention when you create it:
kafka-topics.sh --bootstrap-server localhost:9092 \
--create \
--topic audit-events \
--partitions 6 \
--replication-factor 3 \
--config retention.ms=604800000 \
--config retention.bytes=2147483648
For an existing topic, alter the topic config:
kafka-configs.sh --bootstrap-server localhost:9092 \
--entity-type topics \
--entity-name audit-events \
--alter \
--add-config retention.ms=1209600000
To fall back to the broker default, delete the topic override:
kafka-configs.sh --bootstrap-server localhost:9092 \
--entity-type topics \
--entity-name audit-events \
--alter \
--delete-config retention.ms
Retention and Log Compaction
Kafka cleanup is controlled by cleanup.policy. The common values are delete, compact, or both as compact,delete.
deleteremoves old log segments based on retention time or size.compactkeeps the latest value for each key and removes older values for that key over time.compact,deleteallows both compaction and deletion rules to apply.
Compaction is useful for changelog-style topics, such as customer profile updates keyed by customer ID. It is not a general replacement for retention. Tombstones and delete retention have their own timing behavior, so test compacted topics before relying on them for cleanup.
Practical Retention Checklist
Start with the longest consumer outage you can tolerate. If a consumer group might be offline for 48 hours, a 24-hour retention window is too short.
Estimate disk needs before changing production topics:
ingest rate per second x seconds retained x partition count x replication factor
That is only an estimate because compression, record size, indexes, and segment overhead affect the real number. Still, it gives you a useful starting point.
Monitor broker disk usage, topic growth rate, under-replicated partitions, and consumer lag together. Disk pressure plus rising lag is a warning that consumers may fall outside the retention window.
The safest default is simple: use topic-level retention, document why each high-volume topic has its setting, and test shorter retention in staging before applying it to production.