Transactional Outbox: Database-Kafka Consistency

Solve the dual-write problem with the transactional outbox pattern. PostgreSQL setup, Debezium CDC, and idempotent consumers.

Stéphane DerosiauxStéphane Derosiaux · August 25, 2025 ·
Transactional Outbox: Database-Kafka Consistency

I've debugged this bug more times than I can count: database commits, Kafka write fails, systems are now inconsistent. The order exists in the DB but downstream services never know.

This is the dual-write problem. The transactional outbox pattern is the standard solution.

We lost 6 hours of orders because Kafka was down and the database kept committing. With outbox, the database acts as a buffer. We deployed during a Kafka outage and didn't lose a single event.

Staff Engineer at an e-commerce platform

The Dual-Write Problem

// BROKEN: This code has a consistency bug
public void createOrder(Order order) {
    orderRepository.save(order);  // Database commits
    kafkaTemplate.send("orders", order.getId(), event);  // Kafka fails
    // Order exists but no event published
}

Three failure scenarios break this:

  1. Kafka write fails after database commit
  2. Application crashes between writes
  3. Kafka succeeds but database rolls back

Wrapping in @Transactional doesn't help—Kafka isn't part of that transaction.

How the Outbox Pattern Works

Instead of writing to Kafka directly, write the event to an outbox table in the same database transaction. A separate process relays to Kafka.

┌─────────────────────────────────────────┐
│       Single Database Transaction        │
│  ┌─────────────┐    ┌─────────────────┐ │
│  │ orders      │    │ outbox          │ │
│  │ INSERT      │    │ INSERT event    │ │
│  └─────────────┘    └─────────────────┘ │
└─────────────────────────────────────────┘
                        │
                        ▼ Relay (CDC or Polling)
                      Kafka

Both writes succeed or both fail. The relay handles Kafka delivery asynchronously.

Setting Up the Outbox Table

CREATE TABLE outbox (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    aggregate_type VARCHAR(255) NOT NULL,
    aggregate_id VARCHAR(255) NOT NULL,
    event_type VARCHAR(255) NOT NULL,
    payload JSONB NOT NULL,
    created_at TIMESTAMP NOT NULL DEFAULT NOW()
);
ColumnPurpose
aggregate_typeRoutes to Kafka topic
aggregate_idBecomes the message key
payloadEvent data as JSON
Write both in the same transaction:
@Transactional
public void createOrder(Order order) {
    orderRepository.save(order);
    outboxRepository.save(new OutboxEntry("Order", order.getId(),
        "OrderCreated", toJson(order)));
}

Polling vs CDC

Polling: Background process queries outbox, publishes to Kafka, deletes rows. Simple, works with any database. Adds 100ms+ latency.

CDC (Debezium): Tails the database transaction log. Near-realtime (milliseconds). More infrastructure.

Start with polling for prototypes. Use CDC for production with latency requirements under 1 second.

Debezium Outbox Setup

Enable logical replication in PostgreSQL:

wal_level = logical
max_replication_slots = 4

Deploy the Debezium connector with the EventRouter transform. Kafka Connect monitoring helps track connector status and detect relay failures early.

{
  "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
  "table.include.list": "public.outbox",
  "transforms": "outbox",
  "transforms.outbox.type": "io.debezium.transforms.outbox.EventRouter",
  "transforms.outbox.route.by.field": "aggregate_type",
  "transforms.outbox.table.field.event.key": "aggregate_id"
}

The EventRouter routes messages to topics based on aggregate_type and uses aggregate_id as the key.

Handling Duplicates

The outbox pattern provides at-least-once delivery. The relay might crash after publishing but before deleting the row. On restart, it publishes again.

Consumers must be idempotent:

public void handleOrderEvent(OrderCreatedEvent event) {
    jdbcTemplate.update("""
        INSERT INTO order_projections (order_id, amount, status)
        VALUES (?, ?, ?)
        ON CONFLICT (order_id) DO NOTHING
        """, event.getOrderId(), event.getAmount(), "CREATED");
}

Or track processed event IDs and skip duplicates.

Outbox Table Cleanup

The table grows unbounded. Options:

Delete after publish (polling): Delete rows after successful Kafka send.

Partition by time (high volume):

CREATE TABLE outbox (...) PARTITION BY RANGE (created_at);
-- Drop entire partitions: instant, no row-by-row delete
DROP TABLE outbox_2026_01;

Common Failures

"Replication slot already exists" — Another Debezium instance holds the slot. Check pg_replication_slots and drop if needed.

Events out of order — Different aggregate_id values go to different partitions. Ensure all events for an aggregate use the same key.

High latency (polling) — Reduce poll interval, but watch database load.

Alternatives

Listen to yourself: Write to Kafka first, update database by consuming your own event. Tradeoff: API call fails if Kafka is down.

Kafka transactions: Only works for Kafka-to-Kafka. Doesn't solve database + Kafka consistency.

The outbox pattern trades latency for consistency. Events arrive milliseconds to seconds after the commit, but they always arrive.

Book a demo to see how Conduktor Console gives you visibility into topic schemas, consumer lag, and message contents across all your Kafka clusters.