Build Idempotent Kafka Consumers: Patterns That Actually Work
Handle duplicate Kafka messages gracefully. Database constraints, Redis lookups, and the deduplication patterns that scale.

Your consumer will see the same message twice. This isn't a bug—it's how Kafka works.
I've debugged countless production incidents where teams assumed at-least-once meant exactly-once. The producer retries after a timeout. The consumer crashes before committing. A rebalance triggers reprocessing. Each scenario creates duplicates, and each team learns the hard way.
The solution isn't to fight Kafka's delivery semantics. It's to make your consumer idempotent: process the same message twice, get the same result once.
We stopped chasing exactly-once and embraced idempotent design. Our duplicate rate dropped to zero and our code got simpler.
Senior Engineer at a European fintech
Why Duplicates Happen
Three scenarios cause duplicates:
Producer retries: A producer sends a message, the broker writes it, but the acknowledgment gets lost. The producer retries, creating a duplicate.
Consumer crashes: Your consumer processes a message, calls an external API, then crashes before committing the offset. After restart, Kafka redelivers.
Rebalancing: During consumer group rebalances, partitions move between consumers. Messages processed but not committed get redelivered.
Kafka's idempotent producer (enable.idempotence=true) only prevents duplicates at the broker level. Consumer-side duplicates are your problem.
The Pattern
An idempotent consumer tracks which messages it has processed:
public void consume(ConsumerRecord<String, OrderEvent> record) {
String key = extractIdempotencyKey(record);
if (deduplicationStore.hasProcessed(key)) {
log.debug("Skipping duplicate: {}", key);
return;
}
processOrder(record.value());
deduplicationStore.markProcessed(key);
} The hard part is making this atomic. If you check, process, and mark in three steps, a crash between any of them creates inconsistency.
Designing Idempotency Keys
The key must be unique per logical operation, stable across retries, and ideally generated by the producer.
Option 1: Client-generated UUID — The safest approach. Generate a UUID at the API layer that flows through the system.
Option 2: Composite business key — Derive from business attributes: customerId:orderId:timestamp. Works when the combination is truly unique.
Option 3: Kafka coordinates — Use topic-partition-offset as a key. Simple but breaks if you replay from a different topic.
Pattern 1: Database Constraint
The simplest pattern. Use a unique constraint to reject duplicates.
CREATE TABLE processed_messages (
idempotency_key VARCHAR(255) PRIMARY KEY,
processed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
); In your consumer:
@Transactional(isolation = Isolation.READ_COMMITTED)
public void processOrder(ConsumerRecord<String, OrderEvent> record) {
String key = extractIdempotencyKey(record);
try {
processedMessageRepo.save(new ProcessedMessage(key));
orderService.createOrder(record.value());
} catch (DataIntegrityViolationException e) {
log.info("Duplicate detected: {}", key);
}
} The @Transactional wrapper ensures atomicity. If the order creation fails, the deduplication record also rolls back.
Isolation note: Without proper isolation, concurrent consumers processing the same duplicate can race between check and insert. Use READ_COMMITTED or higher. For high-concurrency scenarios, consider SELECT FOR UPDATE or application-level locking.
Tradeoff: The deduplication table becomes a write hotspot under high load. For high-throughput services, consider Redis.
Pattern 2: Redis SETNX
For higher throughput, use Redis with SETNX:
public boolean tryProcess(String idempotencyKey) {
Boolean wasSet = redis.opsForValue()
.setIfAbsent("dedup:" + idempotencyKey, "1", Duration.ofDays(7));
return Boolean.TRUE.equals(wasSet);
} The TTL handles cleanup automatically. Set it longer than your maximum expected replay window.
Redis persistence: Configure Redis with AOF (appendonly yes, appendfsync everysec) to survive restarts. With RDB-only snapshots, a Redis crash between snapshots loses recent deduplication state, causing duplicates. For critical workloads, use Redis Cluster with replication.
Tradeoff: Redis adds a network hop and infrastructure dependency. If Redis is unavailable, your consumer stops. Consider whether this availability tradeoff works for your use case.
Pattern 3: Natural Idempotency
When your business logic supports upserts, you might not need a separate deduplication store:
INSERT INTO inventory (product_id, quantity, event_version)
VALUES (?, ?, ?)
ON CONFLICT (product_id)
DO UPDATE SET quantity = EXCLUDED.quantity
WHERE inventory.event_version < EXCLUDED.event_version; The WHERE clause prevents out-of-order events from overwriting newer data.
This works for state replacement events. It doesn't work for delta operations ("add 10 to quantity") or side effects (sending emails).
Choosing a Pattern
| Pattern | Throughput | Best For |
|---|---|---|
| Database constraint | Low-Medium | CRUD services, existing DB |
| Redis SETNX | High | Stateless services |
| Natural upsert | Medium-High | State replacement events |
The Deduplication Window
Your window must be longer than your maximum expected delay. Consider:
- How long might your consumer be down?
- Will you ever replay from hours or days ago?
- What's your producer's retry behavior?
Seven days is common. Balance storage costs against your replay requirements.
Testing Idempotency
Always verify your consumer handles duplicates:
@Test
void shouldHandleDuplicateMessages() {
String key = UUID.randomUUID().toString();
ConsumerRecord<String, OrderEvent> record = createRecord(key);
processor.process(record);
processor.process(record);
assertThat(orderRepository.findAll()).hasSize(1);
} Test concurrent duplicates too. Race conditions hide until production.
At-Least-Once + Idempotent = Effectively Exactly-Once
The combination of at-least-once delivery and idempotent consumers gives you effectively exactly-once processing:
At-least-once + Idempotent consumer = Effectively exactly-once This is simpler than Kafka's transactional exactly-once, which requires transactional producers, read_committed consumers, and two-phase commit coordination.
For most services, idempotent consumers are sufficient. Build your consumers to handle duplicates, and you'll stop worrying about Kafka's delivery semantics.
Book a demo to see how Conduktor Console helps you trace message flow and identify deduplication issues across your Kafka clusters.