Stop Calling Your Kafka Topics Data Products

Stéphane Derosiaux February 3, 2026 6 min read
Stop Calling Your Kafka Topics Data Products

Think about what a Kafka topic actually is. A name, and a log of bytes behind it. Any application with the right ACL writes to it. Any application with the right ACL reads from it. Nothing in Kafka enforces what's supposed to be inside, who put it there, or whether it will still look the same next week.

That's a global variable. Shared, mutable, reachable from anywhere, with no type and no owner.

We spent years learning not to build software on global mutable state. Then a lot of us moved the company's nervous system onto Kafka, kept the same free-for-all, and started calling the topics "data products" because someone wrote a page in Confluence.

"In a highly distributed environment, one of the most painful things to know is who owns what." — Engineering leader at an observability company

He's describing shared state. A topic everyone touches and nobody answers for. The README doesn't help him when something breaks and he can't even tell whose data it is.

A log being read by many teams is fine, that's what a log is for. The smell is the mutable, unowned, unguaranteed part: anyone writes any shape, nobody answers for it, nothing protects the readers. The sharing isn't the problem. The absence of everything around the sharing is.

A README doesn't make a topic a product

🚫 "We have data products. They're documented in Confluence."

Documentation describes the state. It doesn't constrain it, doesn't own it, doesn't promise anything about it. The producer can still change the shape of the message tomorrow, and the page goes stale the moment they deploy (it always does).

A product answers questions a wiki page can't:

  • Who owns this, and who do I page when it's wrong?
  • What's actually guaranteed (freshness, uptime, the shape of the payload)?
    • ➡ and what happens to me, the consumer, when that guarantee breaks?
  • Can I find it without asking around in Slack?

If those don't have answers, you don't have a data product. You have a global variable with a comment on it.

What a product actually is

Go back to code for a second. When you want another team to use your data, you don't hand them a pointer to your internal state and a note asking them politely not to mutate it. You give them an interface.

An interface is four things:

  • a signature (the shape is fixed, and something stops you from breaking it),
  • an owner (a name on the other end when it misbehaves),
  • a guarantee (it does what it says, within limits you can design against),
  • an address (you can find it and depend on it without knowing the internals).

A Kafka data product is those same four things wrapped around a topic. Nothing more exotic than that.

Left: a raw topic as shared state, producers writing any shape and consumers reading with no contract, no owner and no guarantee. Right: the same topic wrapped as a data product, an interface with an enforced contract, a named owner, an SLA and a catalog entry that consumers can safely build on.

The log underneath doesn't change. What changes is everything around it: who's accountable, what's promised, what's enforced, what's findable. A data product is a topic plus the things that let another team build on it without trusting you personally.

A contract is enforced, not documented

Schema-on-a-wiki is not a contract. A contract is enforced: the producer cannot ship a change that breaks the people reading from them.

An engineer at an online real-estate company described exactly the work a missing contract creates:

"We're upgrading one of our client libraries, and it actually involves breaking schema changes. So we're doing a lot of manual updating of compatibility mode within the schema registry." — Engineer at an online real-estate company

Manual, careful, and easy to get wrong. With no enforced contract, a producer's deploy is a gamble for every consumer downstream. Enforce it, and the incompatible change is rejected at the boundary, before it reaches anyone.

Without a contract, a producer ships an incompatible v2 and the change lands in the topic, breaking the consumer downstream. With a contract enforced at the boundary, the same incompatible v2 is rejected before it reaches the topic, and the consumer keeps reading safely.

That's what data contracts and Schema Registry compatibility are for: turning "we try not to break you" into "you can't break me." The contract is the type signature of your data product. (Yes, even on a topic full of hand-rolled JSON.)

An owner, not a tag

A product has someone on the other end. Not a team: payments label on the topic (a label answers nothing the moment something breaks), but a team whose job includes keeping this data good, and who gets paged when it isn't.

I've written a whole piece on why ownership is the first domino of a Kafka platform, so I won't relitigate it here. For data products the point is narrow: without an owner, every other property is a wish. A contract nobody maintains drifts. An SLA nobody answers for is marketing. A catalog entry pointing at an abandoned topic is a trap.

A guarantee you can build on

A guarantee is a promise the consumer can design against. Not aspirational uptime on a slide, an actual commitment that's measured and answered for.

The right guarantee depends on who's reading:

  • a fraud check needs your events fresh within a second,
  • a nightly report needs completeness and doesn't care about latency.

Product thinking, applied to a topic, is mostly this: knowing who depends on you, and committing to what they actually need.

We work with an airline where Kafka sits on the path that gets planes off the ground, with a recovery window on the data products that matter measured in hours, not in shrugs. At that point "best effort" isn't a strategy, it's a liability.

Discoverable, or it gets rebuilt

A product you can't find isn't a product, it's a duplicate waiting to happen. Someone needs order data, can't find orders-processed, and creates orders-v2. Now there are two topics, two half-maintained schemas, and no clean answer to which one is true.

"I can't call the APIs if I don't know what it looks like on your end." — Engineer at a commerce company

Same thing inside your own walls. You can't build on data you can't see, can't understand, and can't reach without a favour. Discovery is the import statement of a data product: a catalog where a team searches "orders", finds the real one, sees its owner, its schema, its guarantees, and who's already consuming it. (Then they reuse it, instead of minting orders-v3.)

Stop shipping shared state

Turning a topic into a data product is the work of wrapping a global variable in an interface: a contract that's enforced, an owner who answers, a guarantee a consumer can build on, an address they can find. That's the whole difference between data other teams hope about and data they build on.

A README on a topic is a comment on a global variable. Ship the interface instead.


See what this looks like with federated ownership and enforced data contracts in Conduktor, or book a demo.

Related: What is a Data Mesh? → · Kafka Data Sharing →