Kafka as a Database: When to Use Compacted Topics for State

Use Kafka compacted topics as a lightweight state store. Log compaction configuration, query limitations, and when to choose a real database instead.

Stéphane DerosiauxStéphane Derosiaux · September 10, 2025 ·
Kafka as a Database: When to Use Compacted Topics for State

Compacted topics turn Kafka from a message transport into a state layer. You get key-value semantics, durable storage, and built-in replication.

But Kafka is not a database. I've watched teams learn this the hard way—building query patterns that work in development and collapse in production.

We built our entire user profile system on compacted topics. Worked great until we hit 10 million users and every "lookup" required scanning from offset zero.

Tech Lead at a consumer app

How Compaction Works

Standard Kafka topics are append-only with time-based retention. Compacted topics retain the latest value for each key indefinitely.

kafka-topics --bootstrap-server localhost:9092 \
  --create --topic user-profiles \
  --config cleanup.policy=compact \
  --config min.cleanable.dirty.ratio=0.1

You can also create and configure topics visually instead of managing CLI commands. Produce multiple updates for the same key. Before compaction, all messages exist. After compaction, only the latest survives.

What You Get (and Don't Get)

Compacted topics give you:

  • Latest value per key
  • Durability across broker failures
  • Ordering within partition
  • Tombstone support (delete by sending null)
  • Replay capability from offset 0

Compacted topics don't give you:

  • Point queries (SELECT * FROM topic WHERE key = X)
  • Indexes
  • Transactions across keys
  • Read-after-write guarantees

The fundamental limitation: every "query" is a full topic scan.

The Pattern That Works: KTables

The architecture that makes compacted topics useful isn't reading them directly. It's materializing them into a local store.

KTable<String, UserProfile> users = builder.table(
    "user-profiles",
    Materialized.as("users-store")
);

// Fast local lookup
ReadOnlyKeyValueStore<String, UserProfile> store =
    streams.store(StoreQueryParameters.fromNameAndType(
        "users-store", QueryableStoreTypes.keyValueStore()));

UserProfile user = store.get("user-123");  // Milliseconds, not minutes

The compacted topic is the source of truth. The local RocksDB store is a cache. On restart, Kafka Streams replays the topic to rebuild the store.

This is the "Kafka as database" pattern that actually works.

Configuration for State Stores

kafka-topics --bootstrap-server localhost:9092 \
  --create --topic state-changelog \
  --config cleanup.policy=compact \
  --config min.cleanable.dirty.ratio=0.1 \
  --config segment.ms=300000 \
  --config max.compaction.lag.ms=86400000
ParameterValueEffect
min.cleanable.dirty.ratio0.1Compact when 10% is duplicates
segment.ms300000Roll segments quickly
max.compaction.lag.ms86400000Force compaction within 24h
Tradeoff: Lower ratio means more frequent compaction but higher broker CPU.

Good Fit vs Poor Fit

Good fit:

  • CDC changelog topics (row state keyed by primary key)
  • Configuration distribution
  • Kafka Streams state stores
  • Entity snapshots for downstream consumers

Poor fit:

  • Point queries at scale
  • Complex queries (filtering, joining)
  • High-cardinality random access
  • Low-latency reads without materialization

Common Errors

Null keys rejected:

Compacted topic cannot accept message without key

Compaction requires keys. Every producer must set one.

Compaction not running: Check segment.ms. Compaction only runs on closed segments. Low-throughput topics may keep segments open for days.

The Hybrid Pattern

For production systems needing both Kafka's durability and database queries:

Producer → Compacted Topic → Kafka Streams → Local RocksDB
              (truth)         (materialize)    (fast lookups)

The topic is the log. Everything else is a derived view. If the downstream store fails, rebuild from the topic.

Compacted topics are powerful when used correctly. They're not a database replacement—they're a durable, replayable source of truth that feeds databases, caches, and local stores.

Book a demo to see how Conduktor Console provides visual configuration management and compaction metrics.