Build Idempotent Kafka Consumers: Patterns That Actually Work

Handle duplicate Kafka messages gracefully. Database constraints, Redis lookups, and the deduplication patterns that scale.

Stéphane Derosiaux · April 26, 2024 ·

Build Idempotent Kafka Consumers: Patterns That Actually Work

Your consumer will see the same message twice. This isn't a bug—it's how Kafka works.

I've debugged countless production incidents where teams assumed at-least-once meant exactly-once. The producer retries after a timeout. The consumer crashes before committing. A rebalance triggers reprocessing. Each scenario creates duplicates, and each team learns the hard way.

The solution isn't to fight Kafka's delivery semantics. It's to make your consumer idempotent: process the same message twice, get the same result once.

We stopped chasing exactly-once and embraced idempotent design. Our duplicate rate dropped to zero and our code got simpler.
Senior Engineer at a European fintech

Why Duplicates Happen

Three scenarios cause duplicates:

Producer retries: A producer sends a message, the broker writes it, but the acknowledgment gets lost. The producer retries, creating a duplicate.

Consumer crashes: Your consumer processes a message, calls an external API, then crashes before committing the offset. After restart, Kafka redelivers.

Rebalancing: During consumer group rebalances, partitions move between consumers. Messages processed but not committed get redelivered.

Kafka's idempotent producer (enable.idempotence=true) only prevents duplicates at the broker level. Consumer-side duplicates are your problem.

The Pattern

An idempotent consumer tracks which messages it has processed:

public void consume(ConsumerRecord<String, OrderEvent> record) {
    String key = extractIdempotencyKey(record);

    if (deduplicationStore.hasProcessed(key)) {
        log.debug("Skipping duplicate: {}", key);
        return;
    }

    processOrder(record.value());
    deduplicationStore.markProcessed(key);
}

The hard part is making this atomic. If you check, process, and mark in three steps, a crash between any of them creates inconsistency.

Designing Idempotency Keys

The key must be unique per logical operation, stable across retries, and ideally generated by the producer.

Option 1: Client-generated UUID — The safest approach. Generate a UUID at the API layer that flows through the system.

Option 2: Composite business key — Derive from business attributes: customerId:orderId:timestamp. Works when the combination is truly unique.

Option 3: Kafka coordinates — Use topic-partition-offset as a key. Simple but breaks if you replay from a different topic.

Pattern 1: Database Constraint

The simplest pattern. Use a unique constraint to reject duplicates.

CREATE TABLE processed_messages (
    idempotency_key VARCHAR(255) PRIMARY KEY,
    processed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

In your consumer:

@Transactional(isolation = Isolation.READ_COMMITTED)
public void processOrder(ConsumerRecord<String, OrderEvent> record) {
    String key = extractIdempotencyKey(record);

    try {
        processedMessageRepo.save(new ProcessedMessage(key));
        orderService.createOrder(record.value());
    } catch (DataIntegrityViolationException e) {
        log.info("Duplicate detected: {}", key);
    }
}

The @Transactional wrapper ensures atomicity. If the order creation fails, the deduplication record also rolls back.

Isolation note: Without proper isolation, concurrent consumers processing the same duplicate can race between check and insert. Use READ_COMMITTED or higher. For high-concurrency scenarios, consider SELECT FOR UPDATE or application-level locking.

Tradeoff: The deduplication table becomes a write hotspot under high load. For high-throughput services, consider Redis.

Pattern 2: Redis SETNX

For higher throughput, use Redis with SETNX:

public boolean tryProcess(String idempotencyKey) {
    Boolean wasSet = redis.opsForValue()
        .setIfAbsent("dedup:" + idempotencyKey, "1", Duration.ofDays(7));
    return Boolean.TRUE.equals(wasSet);
}

The TTL handles cleanup automatically. Set it longer than your maximum expected replay window.

Redis persistence: Configure Redis with AOF (appendonly yes, appendfsync everysec) to survive restarts. With RDB-only snapshots, a Redis crash between snapshots loses recent deduplication state, causing duplicates. For critical workloads, use Redis Cluster with replication.

Tradeoff: Redis adds a network hop and infrastructure dependency. If Redis is unavailable, your consumer stops. Consider whether this availability tradeoff works for your use case.

Pattern 3: Natural Idempotency

When your business logic supports upserts, you might not need a separate deduplication store:

INSERT INTO inventory (product_id, quantity, event_version)
VALUES (?, ?, ?)
ON CONFLICT (product_id)
DO UPDATE SET quantity = EXCLUDED.quantity
WHERE inventory.event_version < EXCLUDED.event_version;

The WHERE clause prevents out-of-order events from overwriting newer data.

This works for state replacement events. It doesn't work for delta operations ("add 10 to quantity") or side effects (sending emails).

Choosing a Pattern

Pattern	Throughput	Best For
Database constraint	Low-Medium	CRUD services, existing DB
Redis SETNX	High	Stateless services
Natural upsert	Medium-High	State replacement events

For most services that read from Kafka and write to a database, the database constraint pattern is sufficient and simpler than full exactly-once transactions.

The Deduplication Window

Your window must be longer than your maximum expected delay. Consider:

How long might your consumer be down?
Will you ever replay from hours or days ago?
What's your producer's retry behavior?

Seven days is common. Balance storage costs against your replay requirements.

Testing Idempotency

Always verify your consumer handles duplicates:

@Test
void shouldHandleDuplicateMessages() {
    String key = UUID.randomUUID().toString();
    ConsumerRecord<String, OrderEvent> record = createRecord(key);

    processor.process(record);
    processor.process(record);

    assertThat(orderRepository.findAll()).hasSize(1);
}

Test concurrent duplicates too. Race conditions hide until production.

At-Least-Once + Idempotent = Effectively Exactly-Once

The combination of at-least-once delivery and idempotent consumers gives you effectively exactly-once processing:

At-least-once + Idempotent consumer = Effectively exactly-once

This is simpler than Kafka's transactional exactly-once, which requires transactional producers, read_committed consumers, and two-phase commit coordination.

For most services, idempotent consumers are sufficient. Build your consumers to handle duplicates, and you'll stop worrying about Kafka's delivery semantics.

Book a demo to see how Conduktor Console helps you trace message flow and identify deduplication issues across your Kafka clusters.