Kafka Architecture: Diagram & Components

Stéphane Derosiaux May 23, 2026 6 min read

Kafka's architecture is a distributed commit log: producers append records to topics, brokers store those records in partitioned, replicated logs across a cluster, and consumers read from those logs independently at their own pace. This decoupled, durable design lets a single cluster serve high-throughput writes and reads simultaneously while surviving individual broker failures without data loss.

High-Level Architecture

At the top level, a Kafka cluster has three categories of participant:

  • Producers: applications that write records to topics
  • Brokers: servers that store partition replicas and serve requests
  • Consumers: applications (organized into consumer groups) that read records from topics

Kafka cluster architecture with brokers, partitions, leaders, and followers

The cluster itself is coordinated by a controller, in Kafka 4.0+ this is a KRaft controller quorum, not ZooKeeper. The controller tracks which broker leads each partition, handles partition leadership elections on broker failure, and stores all cluster metadata.

Core Components

Topics

A topic is a named, append-only log. It is the primary abstraction that producers and consumers interact with. Every record belongs to a topic. Topics are retained independently of consumption, consumers can replay past records by resetting their offset.

Key properties:

  • Schema-agnostic: brokers store raw bytes
  • Multi-subscriber: any number of consumer groups can read the same topic independently
  • Configurable retention: time-based (retention.ms) or size-based (retention.bytes)

Partitions

A partition is the physical unit of a topic, an ordered, immutable sequence of records on a single broker's local disk. Every topic has one or more partitions.

Partitions serve two purposes:

  1. Parallelism: each partition is consumed by at most one consumer per group at a time, so partition count caps maximum consumer parallelism
  2. Distribution: partitions are spread across brokers, distributing I/O and storage load

Within a partition, records have a monotonically increasing offset, the mechanism consumers use to track their position.

Brokers

A broker is a Kafka server. It stores partition replicas on local disk, handles producer write requests, serves consumer fetch requests, and participates in replication.

Multiple brokers form a cluster. Kafka distributes partition leaders across brokers so that no single broker bears the full write load.

KRaft Controllers

Since Kafka 4.0 (available since 3.3, GA in 3.5), the cluster controller uses KRaft, Kafka's own Raft-based consensus protocol, to store and manage metadata. Controller nodes (which may be the same as broker nodes in combined mode) form a quorum and store cluster state in an internal __cluster_metadata topic.

The controller is responsible for:

  • Assigning partitions to brokers at topic creation
  • Electing new partition leaders when a broker fails
  • Tracking ISR (in-sync replica) membership per partition
  • Persisting topic configuration, ACLs, and broker registrations

Data Flow: End to End

Producer → [hash(key) % partitions] → Partition Leader (Broker N)
                                            ↓ replicate
                                      Followers (Broker M, Broker P)
                                            ↓
                                      Consumer Group (reads from leader)
  1. Producer sends a record with an optional key. If a key is present, Kafka hashes it to select the target partition, ensuring all records with the same key land in the same partition (ordering guarantee). Without a key, the producer round-robins across partitions.
  2. Partition leader (the broker leading that partition) receives the record, appends it to the local log, and, depending on the acks setting, waits for follower acknowledgment before responding to the producer.
  3. Follower brokers pull new records from the leader and append to their local replica log. Followers that stay within replica.lag.time.max.ms remain in the ISR.
  4. Consumer in a consumer group is assigned a subset of partitions by the group coordinator. It polls the partition leader for new records, processes them, and commits its offset back to Kafka (stored in the __consumer_offsets internal topic).

Partition Replication Layout

Partition replication across brokers with leader/follower roles

For a 3-partition topic with replication-factor 3 across 3 brokers, Kafka spreads leadership and replicas evenly:

Partition 0:  Leader=Broker 1  │  Followers=Broker 2, Broker 3
Partition 1:  Leader=Broker 2  │  Followers=Broker 1, Broker 3
Partition 2:  Leader=Broker 3  │  Followers=Broker 1, Broker 2

If Broker 1 fails, Broker 2 or 3 (whichever is in the ISR for partition 0) is elected leader by the KRaft controller. Producers and consumers discover the leadership change via metadata refresh and reconnect, typically within seconds.

KRaft Architecture (Kafka 4.0+)

KRaft vs ZooKeeper architecture comparison

Before KRaft, Kafka relied on an external ZooKeeper ensemble for metadata coordination. ZooKeeper was a separate cluster to manage, monitor, and upgrade, and it capped practical cluster size at around 200K partitions.

KRaft replaces this with an internal Raft quorum of controller nodes. Benefits:

ZooKeeper modeKRaft mode (4.0+)
Controller electionSecondsMilliseconds
Max partitions~200KMillions
External dependencyZooKeeper clusterNone
Metadata storageZooKeeper znodes__cluster_metadata topic
Auth modelSplit (ZK + Kafka)Unified
For production KRaft deployments, Confluent and the Apache Kafka team recommend 3–5 controller nodes in a quorum (odd number for majority voting). Controllers can be co-located with brokers (combined mode) or run as dedicated nodes (isolated mode) for large clusters.

See Understanding KRaft Mode in Kafka for migration and configuration details.

Tiered Storage Architecture (Kafka 3.6+)

Tiered storage: hot tier on broker disk, cold tier on object storage

Kafka 3.6 introduced tiered storage, decoupling data retention from broker disk capacity:

  • Hot tier: recent log segments on broker local disk, low latency, fast access
  • Cold tier: older segments offloaded to object storage (S3, GCS, Azure Blob Storage), ~90% cheaper per GB than local disk

Consumers are unaware of which tier serves their request. The broker transparently fetches cold segments from object storage and streams them to the consumer.

This enables months or years of retention without scaling broker disk proportionally. Before tiered storage, a 90-day retention window required 90 days of data on every broker disk.

For configuration and best practices, see Tiered Storage in Kafka.

Architecture Enabling Properties

The combination of topics, partitions, brokers, and KRaft gives Kafka its core guarantees:

PropertyMechanism
Horizontal scalabilityAdd brokers; reassign partitions to distribute load
Fault tolerancePartition replication; ISR-based leader election
Per-key orderingKey-based partition routing; per-partition ordering
Independent consumptionEach consumer group tracks its own offsets
ReplayRetention-based log; consumers can reset offsets
High throughputSequential disk I/O; batching; compression

Kafka in the Streaming Ecosystem

Kafka's architecture makes it the hub of modern data platforms:

  • Stream processing: Kafka Streams, Apache Flink, and Apache Spark Structured Streaming consume topics, transform records, and write to output topics
  • Data integration: Kafka Connect moves data between Kafka topics and external systems (databases, S3, Elasticsearch)
  • Event-driven microservices: services publish domain events to topics and subscribe to events from other services
  • Analytics pipelines: operational topics feed data lakes and warehouses via connectors or stream processors

For more on Kafka's role in data platforms, see Apache Kafka and Kafka Producers and Consumers.

Sources