Glossary

Kafka Topics, Partitions, and Brokers: Core Architecture

Understand Kafka's core architecture: topics, partitions, and brokers. Learn how these components enable scalable, fault-tolerant distributed streaming.

Kafka Topics, Partitions, and Brokers: Core Architecture

Apache Kafka's architecture is built on three fundamental components: topics, partitions, and brokers. Understanding how these elements interact is essential for designing robust data streaming systems. This article explores each component and explains how they work together to deliver Kafka's core capabilities: scalability, fault tolerance, and high throughput.

Topics: The Logical Organization Layer

A topic in Kafka represents a logical stream of records. Think of it as a category or feed name to which producers publish data and from which consumers read data. Topics are append-only logs that store records in the order they arrive.

Each topic has a name (e.g., user-events, payment-transactions) and can handle different types of data. Topics are schema-agnostic at the broker level, though producers and consumers typically agree on a data format (JSON, Avro, Protobuf).

Key characteristics of topics:

  • Append-only: Records are always added to the end of the log

  • Immutable: Once written, records cannot be modified

  • Retention-based: Records are retained based on time or size limits, not consumption

  • Multi-subscriber: Multiple consumer groups can independently read the same topic

Partitions: The Scalability Mechanism

While topics provide logical organization, partitions enable Kafka's scalability. Each topic is split into one or more partitions, which are the fundamental unit of parallelism and distribution in Kafka.

How Partitions Work

Each partition is an ordered, immutable sequence of records. Records within a partition have a unique sequential ID called an offset. When a producer sends a message to a topic, Kafka assigns it to a specific partition based on:

  1. Explicit partition: Producer specifies the partition number

  2. Key-based routing: Hash of the message key determines the partition (guarantees ordering for the same key)

  3. Round-robin: If no key is provided, messages are distributed evenly

Topic: user-events (3 partitions)

Partition 0: [msg0, msg3, msg6, msg9]  offset: 0, 1, 2, 3
Partition 1: [msg1, msg4, msg7]        offset: 0, 1, 2
Partition 2: [msg2, msg5, msg8]        offset: 0, 1, 2

Partition Count Trade-offs

Choosing the right number of partitions involves several considerations:

More partitions enable:

  • Higher parallelism (more consumers can read simultaneously)

  • Better throughput distribution across brokers

  • Finer-grained scaling

But too many partitions can cause:

  • Longer leader election times during failures

  • More memory overhead per partition

  • Increased latency for end-to-end replication

  • Higher ZooKeeper/KRaft metadata overhead

A common starting point: calculate partitions based on desired throughput divided by single-partition throughput, typically resulting in 6-12 partitions per topic for moderate-scale systems.

Brokers: The Storage and Coordination Layer

A broker is a Kafka server that stores data and serves client requests. A Kafka cluster consists of multiple brokers working together to distribute the load and provide fault tolerance.

Broker Responsibilities

Each broker in a cluster:

  • Stores partition replicas assigned to it

  • Handles read and write requests from producers and consumers

  • Participates in partition leadership election

  • Manages data retention and compaction

  • Communicates with other brokers for replication

Brokers are identified by a unique integer ID. When you create a topic with multiple partitions, Kafka distributes partition replicas across available brokers.

Replication and Leadership

For fault tolerance, each partition is replicated across multiple brokers. One replica serves as the leader (handles all reads and writes), while others are followers (replicate data from the leader).

Topic: transactions (2 partitions, replication-factor: 3)

Partition 0:
  Leader: Broker 1
  Followers: Broker 2, Broker 3

Partition 1:
  Leader: Broker 2
  Followers: Broker 1, Broker 3

If a broker fails, partitions where it was the leader automatically elect a new leader from the in-sync replicas (ISR). This ensures continuous availability without data loss.

How Components Work Together

The interaction between topics, partitions, and brokers creates Kafka's distributed log architecture:

┌─────────────────────────────────────────────────────────────────────┐
Kafka Cluster Architecture                          
├─────────────────────────────────────────────────────────────────────┤

Topic: user-events (3 partitions, replication-factor: 3)         

┌─────────────┐      ┌─────────────┐      ┌─────────────┐       
Broker 1   Broker 2   Broker 3   
├─────────────┤      ├─────────────┤      ├─────────────┤       

Partition 0 Partition 0 Partition 1 
  [LEADER]   │─────▶│ [FOLLOWER]    [LEADER]   
offset:9   offset:9   offset:12  

Partition 2 Partition 1 Partition 0 
 [FOLLOWER]   [FOLLOWER]   [FOLLOWER]  
offset:7   offset:12  offset:9   

Partition 1 Partition 2 Partition 2 
 [FOLLOWER]    [LEADER]   │◀─────│ [FOLLOWER]  
offset:12  offset:7   offset:7   
└─────────────┘      └─────────────┘      └─────────────┘       


└────────────────────┴─────────────────────┘              

Replication & Failover                          

┌────────────────────────────────────────────────────────────┐  
Producer: Hash(key) Partition Assignment               
Consumer Groups: Partition Consumer Mapping            
└────────────────────────────────────────────────────────────┘  

└─────────────────────────────────────────────────────────────────────┘

This architecture enables:

  • Horizontal scalability: Add brokers to increase storage and throughput

  • Fault tolerance: Replica failover ensures availability

  • Ordering guarantees: Per-partition ordering with key-based routing

  • Independent consumption: Multiple consumer groups process data at their own pace

Kafka in the Data Streaming Ecosystem

Kafka's architecture makes it the central nervous system of modern data platforms. Topics serve as durable, replayable event streams that connect disparate systems:

  • Stream processing: Frameworks like Kafka Streams and Flink consume from topics, transform data, and write to new topics

  • Data integration: Kafka Connect uses topics as intermediaries between source and sink systems

  • Event-driven microservices: Services publish domain events to topics and subscribe to events from other services

  • Analytics pipelines: Data flows from operational topics into data lakes and warehouses

For complex deployments, tools like Conduktor provide visibility into topic configurations, partition distribution, and broker health. This governance layer helps teams understand data lineage, monitor consumer lag across partitions, and ensure replication factors meet compliance requirements.

Summary

Kafka's architecture is elegantly simple yet powerful. Topics provide logical organization, partitions enable horizontal scaling and parallelism, and brokers offer distributed storage with fault tolerance. Understanding these core components is fundamental to designing effective data streaming solutions.

When planning your Kafka deployment:

  • Choose partition counts based on throughput requirements and parallelism needs

  • Set replication factors (typically 3) to balance fault tolerance and resource usage

  • Distribute partition leadership across brokers to avoid hotspots

  • Monitor ISR status to ensure replicas stay synchronized

  • Consider retention policies that match your data replay and storage requirements

Mastering these architectural fundamentals positions you to build scalable, reliable streaming platforms that form the backbone of modern data infrastructure.

Sources and References