Build Idempotent Kafka Consumers: Patterns That Actually Work

Handle duplicate Kafka messages gracefully. Database constraints, Redis lookups, and the deduplication patterns that scale.

Stéphane DerosiauxStéphane Derosiaux · April 26, 2024 ·
Build Idempotent Kafka Consumers: Patterns That Actually Work

Your consumer will see the same message twice. This isn't a bug—it's how Kafka works.

I've debugged countless production incidents where teams assumed at-least-once meant exactly-once. The producer retries after a timeout. The consumer crashes before committing. A rebalance triggers reprocessing. Each scenario creates duplicates, and each team learns the hard way.

The solution isn't to fight Kafka's delivery semantics. It's to make your consumer idempotent: process the same message twice, get the same result once.

We stopped chasing exactly-once and embraced idempotent design. Our duplicate rate dropped to zero and our code got simpler.

Senior Engineer at a European fintech

Why Duplicates Happen

Three scenarios cause duplicates:

Producer retries: A producer sends a message, the broker writes it, but the acknowledgment gets lost. The producer retries, creating a duplicate.

Consumer crashes: Your consumer processes a message, calls an external API, then crashes before committing the offset. After restart, Kafka redelivers.

Rebalancing: During consumer group rebalances, partitions move between consumers. Messages processed but not committed get redelivered.

Kafka's idempotent producer (enable.idempotence=true) only prevents duplicates at the broker level. Consumer-side duplicates are your problem.

The Pattern

An idempotent consumer tracks which messages it has processed:

public void consume(ConsumerRecord<String, OrderEvent> record) {
    String key = extractIdempotencyKey(record);

    if (deduplicationStore.hasProcessed(key)) {
        log.debug("Skipping duplicate: {}", key);
        return;
    }

    processOrder(record.value());
    deduplicationStore.markProcessed(key);
}

The hard part is making this atomic. If you check, process, and mark in three steps, a crash between any of them creates inconsistency.

Designing Idempotency Keys

The key must be unique per logical operation, stable across retries, and ideally generated by the producer.

Option 1: Client-generated UUID — The safest approach. Generate a UUID at the API layer that flows through the system.

Option 2: Composite business key — Derive from business attributes: customerId:orderId:timestamp. Works when the combination is truly unique.

Option 3: Kafka coordinates — Use topic-partition-offset as a key. Simple but breaks if you replay from a different topic.

Pattern 1: Database Constraint

The simplest pattern. Use a unique constraint to reject duplicates.

CREATE TABLE processed_messages (
    idempotency_key VARCHAR(255) PRIMARY KEY,
    processed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

In your consumer:

@Transactional(isolation = Isolation.READ_COMMITTED)
public void processOrder(ConsumerRecord<String, OrderEvent> record) {
    String key = extractIdempotencyKey(record);

    try {
        processedMessageRepo.save(new ProcessedMessage(key));
        orderService.createOrder(record.value());
    } catch (DataIntegrityViolationException e) {
        log.info("Duplicate detected: {}", key);
    }
}

The @Transactional wrapper ensures atomicity. If the order creation fails, the deduplication record also rolls back.

Isolation note: Without proper isolation, concurrent consumers processing the same duplicate can race between check and insert. Use READ_COMMITTED or higher. For high-concurrency scenarios, consider SELECT FOR UPDATE or application-level locking.

Tradeoff: The deduplication table becomes a write hotspot under high load. For high-throughput services, consider Redis.

Pattern 2: Redis SETNX

For higher throughput, use Redis with SETNX:

public boolean tryProcess(String idempotencyKey) {
    Boolean wasSet = redis.opsForValue()
        .setIfAbsent("dedup:" + idempotencyKey, "1", Duration.ofDays(7));
    return Boolean.TRUE.equals(wasSet);
}

The TTL handles cleanup automatically. Set it longer than your maximum expected replay window.

Redis persistence: Configure Redis with AOF (appendonly yes, appendfsync everysec) to survive restarts. With RDB-only snapshots, a Redis crash between snapshots loses recent deduplication state, causing duplicates. For critical workloads, use Redis Cluster with replication.

Tradeoff: Redis adds a network hop and infrastructure dependency. If Redis is unavailable, your consumer stops. Consider whether this availability tradeoff works for your use case.

Pattern 3: Natural Idempotency

When your business logic supports upserts, you might not need a separate deduplication store:

INSERT INTO inventory (product_id, quantity, event_version)
VALUES (?, ?, ?)
ON CONFLICT (product_id)
DO UPDATE SET quantity = EXCLUDED.quantity
WHERE inventory.event_version < EXCLUDED.event_version;

The WHERE clause prevents out-of-order events from overwriting newer data.

This works for state replacement events. It doesn't work for delta operations ("add 10 to quantity") or side effects (sending emails).

Choosing a Pattern

PatternThroughputBest For
Database constraintLow-MediumCRUD services, existing DB
Redis SETNXHighStateless services
Natural upsertMedium-HighState replacement events
For most services that read from Kafka and write to a database, the database constraint pattern is sufficient and simpler than full exactly-once transactions.

The Deduplication Window

Your window must be longer than your maximum expected delay. Consider:

  • How long might your consumer be down?
  • Will you ever replay from hours or days ago?
  • What's your producer's retry behavior?

Seven days is common. Balance storage costs against your replay requirements.

Testing Idempotency

Always verify your consumer handles duplicates:

@Test
void shouldHandleDuplicateMessages() {
    String key = UUID.randomUUID().toString();
    ConsumerRecord<String, OrderEvent> record = createRecord(key);

    processor.process(record);
    processor.process(record);

    assertThat(orderRepository.findAll()).hasSize(1);
}

Test concurrent duplicates too. Race conditions hide until production.

At-Least-Once + Idempotent = Effectively Exactly-Once

The combination of at-least-once delivery and idempotent consumers gives you effectively exactly-once processing:

At-least-once + Idempotent consumer = Effectively exactly-once

This is simpler than Kafka's transactional exactly-once, which requires transactional producers, read_committed consumers, and two-phase commit coordination.

For most services, idempotent consumers are sufficient. Build your consumers to handle duplicates, and you'll stop worrying about Kafka's delivery semantics.

Book a demo to see how Conduktor Console helps you trace message flow and identify deduplication issues across your Kafka clusters.