Kafka Architecture Explained: Core Components and Their Roles

Explore the fundamental building blocks of Apache Kafka's distributed event streaming architecture. This guide clearly explains the roles of Kafka Brokers, Topics, Partitions, Producers, Consumers, and the coordination role of ZooKeeper. Learn how these components interact to ensure high-throughput, fault-tolerant data processing and storage, essential knowledge for any Kafka implementation.

Kafka Architecture Explained: Core Components and Their Roles

Kafka architecture can look confusing at first because the same system handles storage, streaming, replication, and consumer progress. Once you separate the main parts, the model becomes much easier: producers write records to topic partitions, brokers store those partitions, and consumers read records by offset.

This guide explains the core Kafka components and how they work together in a real cluster.

Brokers: The Kafka Servers

A Kafka cluster is made of one or more brokers. A broker is a Kafka server that stores partition data and handles client requests from producers and consumers.

When a producer sends a record, it writes to the broker that currently leads the target partition. When a consumer reads records, it fetches them from the broker that serves that partition. In normal setups, each broker handles many partitions from many topics.

Adding brokers can increase storage capacity and spread traffic, but it does not automatically fix every bottleneck. You also need enough partitions, balanced replica placement, healthy disks, and network capacity.

Topics: Named Streams of Records

A topic is a named stream of records, such as orders, payments, or user_activity. Producers write to topics, and consumers subscribe to topics.

A topic is split into partitions. Each partition is an ordered, append-only log. Kafka guarantees record order within a single partition, not across the whole topic.

That detail matters. If all events for one customer must be processed in order, use a stable key such as customer_id. Kafka's default partitioner uses the key to choose a partition, so records with the same key normally go to the same partition.

Partitions and Offsets

Each record in a partition gets an offset. The offset is a number that identifies the record's position in that partition.

For example, a topic named orders with three partitions might look like this:

orders-0: offset 0, offset 1, offset 2
orders-1: offset 0, offset 1
orders-2: offset 0, offset 1, offset 2, offset 3

Offsets are only meaningful inside their own partition. Offset 3 in orders-2 is not related to offset 3 in another partition.

Partitions give Kafka parallelism. More partitions allow more consumers in the same consumer group to work at the same time, up to one active consumer per partition within that group.

Replication and Leaders

Kafka uses replication to keep data available when a broker fails. Each partition can have multiple replicas on different brokers.

One replica is the leader. Producers and consumers normally talk to the leader for that partition. The other replicas are followers. Followers copy data from the leader and stay ready to take over if the leader fails.

The replication factor controls how many copies Kafka keeps. A replication factor of 3 means Kafka stores three copies of each partition on three brokers, when enough brokers are available.

You can create a topic like this:

kafka-topics.sh --create \
  --topic user_activity \
  --bootstrap-server localhost:9092 \
  --partitions 3 \
  --replication-factor 3

That command requires a cluster with at least three brokers. On a single-broker local setup, use a replication factor of 1.

Producers: Applications That Write Events

Producers send records to Kafka topics. A record can include a key, value, timestamp, and headers.

The producer first asks the cluster for metadata so it knows which broker leads each partition. Then it sends records directly to the right broker.

Producer reliability depends heavily on settings such as:

Setting What it affects
acks How many broker acknowledgments are required before a write counts as successful
retries Whether the producer retries transient failures
enable.idempotence Helps avoid duplicates caused by producer retries
compression.type Reduces network and disk usage for many workloads

For important data, acks=all is common because the leader waits for in-sync replicas before acknowledging the write. Exact behavior also depends on broker settings such as min.insync.replicas.

Consumers and Consumer Groups

Consumers read records from topics. Most production consumers run inside a consumer group.

Within one consumer group, Kafka assigns each partition to only one active consumer at a time. That is how Kafka lets you scale processing while preserving order within each partition.

For example, if orders has three partitions and your service has three consumers in the same group, each consumer can process one partition. If you add a fourth consumer to the same group, one consumer will sit idle because there are only three partitions to assign.

Different consumer groups get independent reads. Your billing service and analytics service can both read the orders topic without stealing records from each other.

Offsets and Consumer Progress

Consumers track progress by committing offsets. Kafka stores committed offsets for consumer groups in an internal topic named __consumer_offsets.

If a consumer crashes and restarts, it uses the committed offset to resume. The timing of commits affects processing behavior:

Commit timing Possible result
Commit before processing finishes A crash can skip records
Commit after processing finishes A crash can reprocess records

Many systems choose at-least-once processing: process the record, then commit the offset. That can create duplicates after a crash, so downstream writes should be idempotent when possible.

Cluster Metadata: ZooKeeper and KRaft

Older Kafka clusters use Apache ZooKeeper to manage cluster metadata and controller election. Many existing installations still run this way.

Newer Kafka deployments can use KRaft mode, Kafka's built-in metadata quorum. In KRaft clusters, Kafka no longer depends on ZooKeeper for metadata management.

When you read older Kafka tutorials, check whether they assume ZooKeeper or KRaft. Commands, configuration files, and operational steps can differ.

How a Record Moves Through Kafka

A typical write and read flow looks like this:

  1. A producer connects to a bootstrap broker and fetches metadata.
  2. The producer chooses a partition based on the record key or partitioning strategy.
  3. The producer sends the record to the leader broker for that partition.
  4. The leader appends the record to its log and followers replicate it.
  5. The leader acknowledges the write based on the producer's acks setting.
  6. A consumer polls the partition and receives records starting from its current offset.
  7. The consumer processes records and commits offsets for its consumer group.

This flow is why Kafka can support both real-time processing and replay. Consumers do not remove records when they read them.

Retention: Kafka Keeps Data by Policy

Kafka is not a traditional queue where a message disappears as soon as one consumer reads it. Kafka keeps records based on retention settings.

Common topic settings include:

retention.ms=604800000
retention.bytes=10737418240

retention.ms controls time-based retention. retention.bytes controls size-based retention. Actual cleanup also depends on segment settings and broker configuration.

Some topics use log compaction instead of, or alongside, delete-based retention. Compaction keeps the latest value for each key, which is useful for state-like topics such as user profiles or configuration changes.

What to Remember

Kafka's architecture is built around partitioned logs. Brokers store partitions, producers write to partition leaders, consumers read by offset, and consumer groups split work across partitions.

When you design a Kafka topic, think about ordering, partition count, replication factor, retention, and consumer group behavior together. Those choices shape how your system scales, recovers from failures, and replays old events.