GDPR and Kafka: Right to Erasure

Handle GDPR Article 17 deletion requests in Kafka's append-only log. Crypto shredding implementation, tombstone patterns, and compliant retention strate...

Stéphane DerosiauxStéphane Derosiaux · April 24, 2024 ·
GDPR and Kafka: Right to Erasure

GDPR Article 17 gives users the right to demand deletion within 30 days. Kafka's append-only architecture makes traditional deletion impossible.

I've helped multiple companies navigate this conflict. The solutions exist, but each has tradeoffs you need to understand before implementation.

We thought GDPR would force us off Kafka. Crypto shredding let us keep our event-driven architecture while satisfying regulators.

Compliance Engineer at a European bank

Why Kafka Makes Erasure Difficult

You cannot surgically remove records from a Kafka topic. Data exists in active segments, replicas, consumer state stores, downstream databases. Physical deletion would require rewriting segments, coordinating across replicas, invalidating offsets. Kafka doesn't support this.

Solution 1: Short Retention

Configure retention below 30 days:

retention.ms=2419200000  # 28 days

After 28 days, all data is automatically purged. When a deletion request arrives, the data will be removed within the retention window.

When this works: Event streaming consumed quickly, real-time dashboards, staging topics.

When this fails: Audit requirements mandate longer retention, event sourcing needs full history.

Solution 2: Tombstones on Compacted Topics

Compacted topics retain only the latest record per key. A tombstone (null value) signals deletion.

cleanup.policy=compact
delete.retention.ms=604800000  # 7 days - tombstones must remain visible for all consumers

Why 7 days? Consumers offline for longer than delete.retention.ms won't see the tombstone and won't delete their local state. 24 hours is too short for maintenance windows or holiday outages.

Publish a tombstone:

producer.send(new ProducerRecord<>("users", "user123", null));

After compaction, the record is gone. Consumers must handle nulls:

if (user == null) {
    userCache.remove(key);  // Tombstone: delete from local state
}

Critical requirement: Topics must be keyed by user ID. Random keys make tombstones useless.

Solution 3: Crypto Shredding

Encrypt data per user. On deletion, destroy the key. The ciphertext remains but becomes meaningless. See Conduktor's encryption guide for implementing field-level encryption without code changes.

Master KEK (KMS) → User DEKs → Encrypted Records
                      ↓
              Delete on GDPR request

Encrypt on produce:

SecretKey dek = keyStore.getOrCreateDek(userId);
byte[] ciphertext = encrypt(dek, plaintext);

Delete on erasure:

keyStore.deleteDek(userId);  // Data becomes unreadable
auditLog.record("GDPR_ERASURE", userId, Instant.now());

GDPR requires making data "inaccessible." Destroying the key achieves this without modifying Kafka's immutable log.

Cost consideration: Cloud KMS charges per key. At scale, use derived keys from a master key + user ID to reduce costs.

Solution 4: Separate PII from Analytics

events-with-pii (28-day retention) → transform → events-anonymized (indefinite)

A stream processor strips or hashes PII:

piiEvents
    .mapValues(event -> new AnonymizedEvent(
        hash(event.getUserId()),
        event.getCountry(),
        event.getTimestamp()
    ))
    .to("events-anonymized");

GDPR erasure only affects the PII topic. Analytics continue uninterrupted.

Downstream Systems

Kafka is rarely the only place personal data lives. Deletion must cascade to:

  • ksqlDB materialized views
  • Databases via Kafka Connect
  • Search indices
  • Caches

Pattern: Produce to a user-deletions topic. All downstream systems consume and purge:

producer.send(new ProducerRecord<>("user-deletions", userId, new DeletionEvent(userId)));

Choosing a Strategy

ScenarioStrategy
All topics < 28 days retentionShort retention
Topics keyed by user IDTombstones
Long retention requiredCrypto shredding
PII mixed with analyticsSeparate topics
Most organizations use a combination: short retention for transient data, crypto shredding for sensitive data requiring long retention.

Book a demo to see how Conduktor Gateway provides field-level encryption and crypto shredding without application changes.