Event sourcing with Kafka: patterns and pitfalls

Build event-sourced systems on Kafka. Topic design, aggregate reconstruction, CQRS projections, and the gotchas that break production.

Stéphane Derosiaux · July 3, 2024 ·

Event sourcing with Kafka: patterns and pitfalls

Event sourcing stores state as a sequence of events rather than current values. Kafka's append-only log makes it a natural fit—but Kafka wasn't designed as an event store. The gaps between "natural fit" and "production-ready event store" are where teams get burned.

I've seen the pattern work beautifully for CQRS projections. I've also seen it fail spectacularly when teams try to reconstruct aggregates from topics with millions of events.

We use Kafka for event sourcing, but we learned to keep aggregate state in a database. Reconstructing from topics doesn't scale.
Architect at a European bank

Why Kafka for Event Sourcing

Traditional event stores (EventStoreDB, Axon Server) provide purpose-built features: aggregate streams, optimistic concurrency, subscriptions. Kafka provides none of these natively.

So why use it? You probably already have Kafka running. The throughput is massive. Consumer groups handle parallel processing. The ecosystem (Streams, Connect, ksqlDB) is mature.

What Kafka lacks: Per-aggregate queries, built-in optimistic concurrency, aggregate snapshots, event upcasting.

If your events are primarily consumed by downstream projections—read models, analytics, search indexes—Kafka is excellent. If you need to frequently reconstruct individual aggregates from their event history, Kafka requires additional tooling.

Topic Design

One topic per aggregate type with the aggregate ID as the message key. You can manage topic configuration including retention through Conduktor Console.

kafka-topics --bootstrap-server localhost:9092 \
  --create --topic order-events \
  --partitions 12 \
  --config retention.ms=-1

Why infinite retention: Events are your source of truth. Deleting them means losing history.

Key = aggregate ID ensures all events for one aggregate land in the same partition, preserving order.

The Aggregate Reconstruction Problem

Kafka has no "give me all events for key X" API. You must:

Determine which partition the key hashes to
Scan the entire partition looking for matching keys
Hope you've seen all events (you can't know for sure)

For a topic with millions of events, scanning a partition to find 50 events for one aggregate is extremely inefficient.

Solution: Maintain State

Don't reconstruct from Kafka on demand. Build and maintain aggregate state as events arrive:

KTable<String, Order> orders = events
    .groupByKey()
    .aggregate(
        Order::new,
        (orderId, event, order) -> order.apply(event),
        Materialized.as("order-store")
    );

Or store current state in a database alongside Kafka. You now have two sources of truth, but the database gives you O(1) lookups.

Solution: Snapshots

Periodically save aggregate state to a compacted topic:

kafka-topics --create --topic order-snapshots \
  --config cleanup.policy=compact

On reconstruction, load the snapshot and replay only events since. This reduces reconstruction time from O(all events) to O(events since snapshot).

CQRS Projections

This is where Kafka shines. Multiple consumer groups, each building a different view:

@KafkaListener(topics = "order-events", groupId = "search-projection")
public void buildSearchIndex(OrderEvent event) { ... }

@KafkaListener(topics = "order-events", groupId = "analytics-projection")
public void buildAnalytics(OrderEvent event) { ... }

Each group maintains its own offset. You can add new projections anytime, rebuild by resetting offsets to the beginning, and scale each independently.

The Pitfalls

Log Compaction Deletes History

# WRONG: This deletes historical events
cleanup.policy=compact

Compaction keeps only the latest value per key. For event sourcing, this is catastrophic. Use retention.ms=-1 for event topics. Use compaction only for snapshots.

No Optimistic Concurrency

Traditional event stores reject writes when the expected version doesn't match. Kafka has no such mechanism.

Service A reads version 5
Service B reads version 5
Service A writes version 6 ✓
Service B writes version 6 ✓  // Both succeed—version conflict!

Solutions: External locking (Redis, ZooKeeper), single writer per aggregate (partition commands), or detect-and-compensate (deduplicate in consumers).

Out-of-Order Events

Multiple producers writing events for the same aggregate can cause ordering issues. Kafka orders by arrival time, not logical sequence.

Fix: Route all commands for an aggregate through one service instance, or validate version continuity in consumers.

Schema Evolution

Your events will change. Without schema management, old consumers break.

Use Schema Registry from day one. Set compatibility mode to allow adding optional fields without breaking consumers.

When Not to Use Kafka

Kafka isn't ideal for:

Frequent aggregate reconstruction: Inefficient without additional tooling
Strong optimistic concurrency: Can't reject conflicting writes
Small event volumes: Operational overhead isn't worth it for <10K events/day

Consider purpose-built event stores for these cases.

When It Works

Event sourcing with Kafka works well when your primary use case is building CQRS projections from event streams. The pattern scales to massive throughput. The ecosystem integration is excellent.

Know the limitations before committing. Don't treat Kafka as a general-purpose event store. Use it for what it does best: durably streaming events to multiple consumers at scale.

Book a demo to see how Conduktor Console shows consumer lag and event payloads across your Kafka clusters.