Whitepaper
Where Kafka Costs Hide: A Field Guide
For platform leads, Kafka architects, and streaming engineers who know Kafka is expensive but can't fully explain where or why. A structured way to find, quantify, and act on the hidden costs in your estate.

Before getting into the patterns, a quick frame on what makes up the total cost of ownership of a Kafka estate. The conversation often collapses into "the bill" when the real picture is broader. Four layers contribute.
| Layer | What it covers | How it surfaces |
|---|---|---|
| Infrastructure | Compute for brokers and controllers, storage multiplied by replication factor, networking | Hosted: capacity units sized by throughput, partition count, and storage. Self-managed: servers, storage arrays, datacenter capacity, network gear |
| Ecosystem tooling | Schema Registry, Kafka Connect, stream processing engines, cross-cluster replication, monitoring | Hosted: paid add-ons. Self-managed: internal compute and operational time |
| Vendor and licensing | Platform licenses, tier surcharges, feature add-ons, support contracts, professional services | Negotiated annually or multi-year, often opaque to the platform team |
| Operational | Engineering time spent running clusters, responding to incidents, supporting internal consumers | Self-managed: full broker operations burden. Hosted: cost analysis, configuration management, vendor coordination |
Each of these is a real line item in any Kafka environment, and any serious cost conversation has to acknowledge that Kafka is rarely just "the brokers." Each layer carries some mix of value-producing spend and structural drift.
Infrastructure is where the bill itself can be reduced most directly. Ecosystem tooling costs follow architectural decisions the business has already committed to, and pulling them back means rethinking the use cases those tools support, not adjusting a config. Vendor and licensing costs are typically locked in for the contract term, so reductions there land at renewal rather than month-to-month.
The operational layer is real and worth its own treatment, but the mechanisms there look different (governance, automation, federated ownership) and produce a different kind of return: not a smaller bill, but a small platform team able to support a growing number of internal Kafka users without growing in proportion.
The four layers that contribute to total Kafka cost, sized by typical share of recoverable spend. Infrastructure is where the bill itself can be reduced most directly, and where this guide focuses.
Within infrastructure, six patterns account for most of the addressable waste in the estates we've analyzed. Each gets its own section below. The patterns rarely sit in isolation: decisions made early in an estate's life cascade through several patterns at once, and the largest savings come from recognizing where they connect, which we call out as we go.
Pattern
Are partition counts justified by the topic's actual throughput?
What the pattern looks like
Topics whose partition counts far exceed what their actual throughput requires. The count was usually set at creation against a peak load that never materialized, inherited from another topic, or pulled from a default template tuned for a different use case.
Why it happens
Partition counts are set at topic creation and cannot be reduced afterward without recreating the topic end to end and coordinating the switch with every producer and consumer. At creation time, nobody knows what throughput will eventually look like, so teams fall back on heuristics:
- Copy the count from another topic that seems similar.
- Apply a rule of thumb like "three times the broker count."
- Pick a number that feels safe.
There is no feedback loop that says "you overprovisioned this by 10x," and it compounds silently. Internal topics created by stream processing engines inherit the source partition count, so one early decision cascades across every downstream operation.
How to identify it
Two queries get you most of the way there:
- List topics by partition count, sorted descending.
- For the top of that list, pull average throughput per partition over the last 30 days.
Topics where average throughput per partition is far below the meaningful threshold for partition parallelism are candidates for concentration. In practice, a topic with 50 partitions averaging less than a megabyte per second per partition is almost certainly overprovisioned. The exception is workloads with bursty traffic that needs the partition count to absorb peaks, so filter for those carefully.
What it costs
The partition counts shown on a topic config are leaders. What actually counts against broker capacity is partition-replicas (leaders × replication factor). With the typical RF of 3, a topic with 50 leader partitions takes 150 slots of broker capacity. When you sort the worst offenders, the relevant number is partition-replicas, not the leader count.
The cost impact scales steeply with estate size. On small estates (a single cluster, a few hundred topics), partition overprovisioning is often invisible: the cluster runs fine because partition pressure is below the threshold where it triggers operational events. On medium estates (multi-cluster, thousands of topics), capacity tier expansions start getting triggered earlier than usage justifies, and the cumulative count begins to move the bill. On large estates (tens of thousands of topics across multiple clusters), the absolute waste grows substantially; concentration analyses routinely identify 50 to 70 percent of partitions as eligible for retirement.
| On hosted platforms | On self-managed estates |
|---|---|
|
|
The mechanism behind both is the same: brokers max out at roughly 4,000 partition-replicas each, including leaders and followers. The savings on partition concentration tend to be larger than the per-topic math suggests because each consolidated topic delays the next capacity expansion on hosted, or the next hardware procurement cycle on self-managed.
Brokers max out at roughly 4,000 partition-replicas each. Once cumulative partition decisions push the cluster one over that ceiling, the platform's only answer is a full new capacity unit. Tiny change in demand, doubled bill.
How this cascades
Partition overprovisioning is rarely contained at the source topic. Stream processing engines (Kafka Streams, ksqlDB) create internal topics that inherit the source partition count, so an overprovisioned source cascades through every derived stream downstream. The fix is the same (right-size the source), but the leverage compounds because every dependent operation inherits the gain.
Stories from the field
Two-thirds of topics eligible for concentration on a "well-tuned" estate.
The setup. A platform team running on a managed cloud platform assumed their environment was reasonably well tuned. The estate consisted of a handful of clusters split between non-prod and prod, carrying thousands of topics, tens of thousands of partitions, and several terabytes of storage.
The finding. The concentration analysis surfaced patterns invisible from inside day-to-day operations. Roughly two-thirds of topics were eligible for partition concentration, with counts that could be collapsed toward 1 without loss of functionality. A similar share of total partition-replicas could be retired. The managed-platform capacity units the clusters had been sized for could drop by more than half. Projected annual savings ran well into the six figures, before any architectural change.
Where the waste concentrated. The single largest share sat in the non-prod footprint: hundreds of per-developer test topics created by branching an existing topic, dead letter topics provisioned at high partition counts with no consumer activity, and an entire non-prod cluster carrying hundreds of topics on effectively zero throughput. The prod clusters told a subtler version of the same story, with partition counts set against peak load profiles from an earlier phase of the architecture and never revisited as traffic patterns evolved. Roughly half of the partitions on one prod cluster were collapsible without operational impact.
Retention misalignment
Pattern
Does retention match how consumers actually read?
What the pattern looks like
Long retention applied uniformly across topics with very different consumption patterns.
- Topics retaining seven days of data when consumers read the most recent hour.
- Topics retaining indefinitely because someone once worried they might need to replay, even though no replay has ever happened.
Why it happens
Retention defaults get set early and rarely revisited. The cost of "more retention" is invisible until storage hits a threshold; the cost of "less retention" is a potential incident if a consumer falls behind. The asymmetry makes generous retention the safe choice from any individual decision, even when the cumulative effect is significant.
The mechanism also has more moving parts than most teams realize. retention.ms doesn't operate alone. Three related configurations interact with it:
segment.ms(default 7 days). Segments only become eligible for deletion after they close. A topic with 1-hour retention but the defaultsegment.mscan hold a week of data on a low-throughput partition because the segment never rolls.segment.bytes(default 1 GB). A second trigger for segment closure. Whichever boundary is hit first (time or size) closes the segment.retention.bytes(per partition). A topic withretention.bytes=10 GBand 50 partitions can consume 500 GB before retention kicks in.
Most teams set retention.ms once and never look at the related configurations.
How to identify it
Compare each topic's configured retention to the actual consumer-side read pattern. Look at each consumer group's committed offset position relative to the log-end-offset over time. Groups that consistently sit within the most recent few hours of data and never trail back further indicate consumers reading recent data only. Where this is true and there is no replay history, retention can usually be reduced toward the actual read window.
The harder cases are topics where retention is doing real work and a configuration change is not the answer:
- Topics used for replay (historical reprocessing after a bug fix or schema change)
- Topics under a compliance window
- Topics with downstream batch consumption
Audit three more things alongside retention.ms:
segment.ms: whether segments are rolling fast enough for the configured retention to take effect.retention.bytes: per-partition, so total retention = retention.bytes × partition count × replication factor.- The gap between configured retention and actual disk usage: large gaps usually trace back to segment-roll issues or producer timestamp skew rather than the retention setting itself.
What it costs
Storage cost scales linearly with retention, but the multipliers are easy to underestimate.
Total storage per topic = throughput × retention × partition count × replication factor
With the typical RF of 3, every gigabyte of useful data takes three gigabytes of disk. On managed platforms, generous retention either drives up direct storage charges or pushes the cluster into a larger capacity tier. On self-managed, it consumes disk that could otherwise host other workloads. Across a large estate, halving retention on the topics where it is safe to do so can reduce total storage by 20 to 40 percent.
Configured retention is a minimum, not a maximum. Actual retention is shaped by segment timing:
Actual retention = configured retention + segment roll time (up to 7 days) + check interval (5 min) + deletion delay (1 min)
A cluster sized for 500 GB based on throughput times retention can run at 800 GB or more once segment timing is accounted for.
For long retention windows driven by compliance or replay needs, tiered storage is worth evaluating as an architectural option rather than a configuration change. Offloading cold segments to object storage removes the replication multiplier on the cold tier (object storage handles durability internally), which can reduce per-gigabyte cost on historical data by roughly 3 to 9x. The tradeoff is read latency on cold data, which is acceptable for compliance archives but not hot consumer paths.
| On hosted platforms | On self-managed estates |
|---|---|
|
|
Stories from the field
1-hour retention. 7 days of disk used.
A platform team set 1-hour retention on a low-throughput logging topic and ran out of disk after a week. The cause was segment.ms still at the 7-day default. Segments were not rolling, so retention had nothing to delete.
The gap between "what should be 1 hour of data" and "what was actually 7 days" was completely invisible from the topic config alone, since retention.ms read 1 hour as expected. The fix was a one-line config change to roll segments every 30 minutes.
Cluster sprawl
Pattern
Is each cluster doing real work?
What the pattern looks like
Many clusters, most of them underutilized. Often the result of a "one project, one cluster" strategy adopted to provide isolation between teams or workloads.
The problem is not multi-cluster as such. Plenty of estates run multiple clusters for legitimate reasons:
- Environment separation (dev, staging, prod)
- Geographic distribution to keep clusters near users
- Data residency requirements (GDPR, HIPAA)
- Isolating workload classes that genuinely do not mix (high-throughput batch versus real-time, large multi-tenant customers)
Sprawl is what happens when clusters get added beyond those reasons, or stood up at adoption and never revisited as the estate matures. The patterns we see most often:
- A cluster per project created for organizational reasons rather than technical ones.
- Environment stratification that went a level too deep: separate dev and QA clusters where one would have done.
- On-prem patterns carried into cloud unchanged: clusters shaped by on-premises constraints that got migrated as-is, where the cost is now much more visible.
Why it happens
Multi-tenancy in Kafka is genuinely hard, and per-cluster isolation solved real problems:
- Blast radius containment
- Clean ownership
- Security boundaries
- Avoiding the operational complexity of quotas and ACLs
On hosted platforms, spinning up a new cluster is frictionless and the cost is absorbed into a platform budget that individual project teams never see. On self-managed estates, the strategy is more often driven by compliance isolation requirements. Both paths lead to the same outcome: cluster count growing faster than the workload, with each cluster carrying the same baseline cost regardless of how much traffic it handles.
Spawning a new cluster instead of building well-tuned multi-tenancy is often a rational substitute for operational expertise that the platform team has not been resourced for. Running a single large multi-tenant cluster well requires deep tuning experience and active governance, while standing up a new cluster per project does not.
The substitution works in the moment, but it compounds: each new cluster reproduces the same trade rather than building the expertise that would let one cluster serve more workloads.
How to identify it
Pull three utilization ratios per cluster: peak ingress as a share of provisioned ingress, total partitions as a share of the platform's partition cap, and storage used as a share of provisioned storage. Clusters running below a meaningful threshold (typically 25 to 30 percent at peak load) on all three dimensions are consolidation candidates.
A second pass on the list: ask which environment each cluster belongs to and whether the separations between them are doing real work. Four questions tend to surface consolidation candidates:
- Are dev and prod on different clusters? Usually justified.
- Are dev and QA on different clusters? Often not justified.
- Do regional clusters actually serve regional traffic? If not, candidates for consolidation.
- Does the same team operate multiple clusters for different applications? Usually not justified once the team has operational maturity for multi-tenancy.
The harder analysis is which clusters can be consolidated without disrupting the teams that depend on them, since the original isolation strategy created organizational dependencies. Configuration drift between clusters that started identical also makes consolidation harder over time, because "merge cluster A into B" surfaces every retention setting, ACL, and quota inconsistency that has accumulated since they diverged.
What it costs
The baseline cost of a Kafka cluster does not shrink with usage. Every cluster requires a minimum broker count, controller infrastructure, baseline storage, networking, and operational overhead. On hosted platforms, this surfaces as a minimum capacity unit allocation. A cluster running at 5 percent of capacity costs the same as one running at 50 percent if both sit at the same tier.
With 70 to 200 underutilized clusters in a sprawled estate, the aggregate baseline cost dwarfs the active workload. The aggregate waste is often larger than partition overprovisioning, but it is less visible because each individual cluster looks fine in isolation.
Operating ten clusters consistently is not ten times the work of operating one, but it is meaningfully more, and the gap grows as cluster count grows. Configuration drift, version skew, certificate expiry tracking, and per-cluster monitoring overhead all scale with the number of clusters even when traffic does not. That overhead becomes part of what the platform team spends its time on, and ties back to the operational overhead discussion later in this guide.
| On hosted platforms | On self-managed estates |
|---|---|
|
|
Stories from the field
200 clusters for 200 projects.
A platform team had grown to roughly 200 Kafka clusters supporting 200 projects, one cluster per project. The strategy made sense at adoption: per-cluster isolation avoided the operational complexity of multi-tenancy, gave each project clean ownership, and contained blast radius.
Each cluster carried the same baseline cost regardless of traffic, so the majority of the estate was paying full price for fractional utilization. There was also no mechanism for any team to share infrastructure with another, which meant the only growth path was more clusters.
How this cascades
Cluster sprawl rarely produces only a cluster cost. When data lives in separate clusters, cross-project sharing becomes meaningfully harder: consumers in one cluster cannot subscribe to topics in another without a replication setup.
The path of least resistance is to replicate the data each consumer needs into their own cluster, which is one of the primary drivers of Pattern 4 (topic proliferation and duplication). A decision to optimize for operational simplicity by spawning per-project clusters ends up generating both the direct cluster cost and the duplicate-topic cost downstream.
Cleanup that addresses only the cluster count without the duplicate topics is incomplete; cleanup that addresses only the topics without consolidating clusters tends to recreate the same duplicates over time.
Pattern
Are these topics still in active use?
What the pattern looks like
Two related dynamics. First, topics created for experiments, migrations, or short-lived projects that were never decommissioned. Second, duplicate topics created in parallel when teams cannot find an existing topic that already serves the same purpose, so they create their own.
Three sub-cases show up underneath this:
- Orphans. Topics with no recent traffic and no active consumers, abandoned in place because nobody owns the cleanup.
- Near-duplicates. Topics with similar schemas, similar names, or the same source data with slightly different transformations, created because a team did not find the existing version.
- Stream-processing topics. Internal topics created by Kafka Streams, ksqlDB, or Flink. They inherit the source topic's partition count, so an overprovisioned source cascades into every derived stream downstream. They also accumulate when stream processing is used as a workaround for filtering: because Kafka cannot scope what a consumer sees on the original topic, teams write a filtered subset to a new topic, which duplicates the data once per unique subset that needs to be exposed (one per geo region, tenant, or status).
Each sub-case has a different remediation path, and the diagnostic has to distinguish them.
Why it happens
Topics are easy to create and hard to delete. Most platform teams have no central catalog of who owns what or what is in active use. Without a discoverability mechanism, duplication is the path of least resistance: when a team cannot find what they need, recreating it is faster than tracking down the existing version.
Once a topic exists, deleting it requires confirming nobody is reading from it, a coordination exercise the platform team rarely has time for. The result is accumulation in both directions: more topics created than needed, and few of the unneeded ones ever retired.
Naming conventions either help or actively hurt.
- Names that encode business meaning (order-events-v3, customer-profile-snapshots) let a developer recognize a candidate for reuse at a glance.
- Names that encode implementation detail (svc-xyz-internal-v2, kstream-repartition-7d4f) tell that developer almost nothing, so the path of least resistance is to create their own.
Estates without enforced naming standards or a searchable catalog accumulate duplicates as a structural feature rather than a discipline failure.
How to identify it
Two queries plus two more for stream-processing topics:
- Topics with zero traffic over the last 30 to 60 days and no active consumer groups: candidates for retirement.
- Topics with similar schemas or naming patterns: candidates for consolidation.
- Internal topics produced by stream processing jobs whose partition count exceeds the actual throughput of the derived stream: candidates for source-side right-sizing that will cascade through the pipeline.
- Internal topics that are filtered subsets of an upstream topic (similar schema, smaller volume, single downstream consumer): candidates for review. If the filter is for convenience, the logic can usually move to the consumer. If the filter is enforcing a security or compliance boundary, the duplication may be the right cost, but it is worth confirming the boundary still holds.
The orphan case is the hard one. Many idle-looking topics are legitimately idle (DR topics, batch processing that runs monthly, schemas of record). Distinguishing legitimately idle topics from abandoned ones requires owner verification, which is itself a coordination problem.
What it costs
Each topic carries replication and metadata cost, even with zero traffic. At scale, thousands of orphaned topics are real money.
The duplicate case is more expensive: each duplicate stream multiplies producer fanout and consumer cost in addition to the topic itself.
Filter-workaround topics multiply differently: once per unique subset that needs to be exposed, so a source feeding 20 downstream regional teams becomes 20 derived topics plus the source, each carrying its own replication, storage, and metadata cost. Across a typical estate, retirable orphans plus de-duplicable streams plus filter-driven derived topics contribute meaningfully to total cost, and the visibility gap that produced them is itself a governance problem.
The leverage on this work is larger than the per-topic line-item math suggests. Each orphan holds partition-replicas against the per-broker ceiling from Pattern 1. Removing them releases broker headroom and delays the next capacity expansion:
Each retired topic propagates through the chain: partitions released, partition-replicas freed (multiplied by replication factor), broker headroom restored, and ultimately the next step-function capacity expansion delayed. The leverage compounds because the math compounds.
The direct cost of carrying orphans, near-duplicates, and stream-processing topics adds up. The avoided capacity expansion the cleanup buys can be larger than the direct cost on its own.
| On hosted platforms | On self-managed estates |
|---|---|
|
|
Stories from the field
5,000 empty topics. None deletable.
One platform team had more than 5,000 empty topics on a single cluster. None were producing or consuming. None could be safely deleted, because the platform team did not own them and could not confirm which were genuinely abandoned and which were waiting on a consumer that ran monthly or quarterly.
Retirement required individual outreach to dozens of project teams, with no incentive on the other side to respond. The estate accumulated topics indefinitely because deletion was a coordination problem nobody had time to solve.
How this cascades
Topic proliferation is often a downstream effect of other patterns or of Kafka design constraints:
- Cluster sprawl (Pattern 3) makes cross-project data sharing harder, so teams replicate data into their own cluster as duplicates rather than subscribing across boundaries.
- Partition overprovisioning (Pattern 1) cascades through Streams and ksqlDB, producing chains of overprovisioned derived topics for every transformation.
- Kafka's inability to scope what a consumer sees on a topic drives a continuous trickle of filtered-subset topics whenever a downstream service needs only part of an upstream stream.
Treating topic proliferation as a standalone cleanup ignores all three sources; the durable fix usually requires touching the upstream cause, not just the symptom.
Pattern
Is the cumulative cost of client choices visible to the teams making them?
What the pattern looks like
Egress traffic running many times higher than ingress because of redundant consumer groups, unnecessary fan-out, or misconfigured clients. Producers configured with no compression. Consumers polling at high frequency without backoff.
Several sub-patterns show up underneath:
- Compression off. Producer defaults are commonly off for backward-compatibility reasons, and the cost of "off" is invisible to the producer-owning team.
- Consumer fan-out. A topic that started with one consumer ends up with dozens as new services subscribe; each one is a multiplier on egress and broker work.
- Aggressive polling. Consumers configured at default poll rates with no backoff that hammer the broker on idle topics.
- Cross-AZ traffic. On hosted platforms, traffic between availability zones is a direct line item, and consumer-to-leader assignments that do not account for AZ topology pay this tax continuously.
- Large messages without claim-check. Topics carrying multi-megabyte messages that could be replaced with a reference to object storage, which is dramatically cheaper.
Why it happens
Clients are configured by application teams, who optimize for their own service rather than for the cluster they share. From any single application's perspective, "use the default" is the right call: compression off, default poll rates, default fan-out as new services subscribe.
From the cluster's perspective, hundreds of applications using defaults is what produces the inefficiency. The cost penalty is invisible to the producer-owning team, and without a central audit or a guardrail at topic creation, no team has the incentive or the visibility to investigate.
How to identify it
Compare ingress to egress on a per-topic basis. Topics where egress exceeds ingress by more than a small multiplier indicate fan-out that may be addressable through architectural changes (consumer consolidation, claim-check patterns for large messages, caching at the consumer side). Compression auditing is straightforward: pull producer configurations and identify topics where a meaningful share of producers are not using compression.
Three additional checks worth running:
- Consumer count per topic. Topics with high consumer counts and high replay rates are candidates for claim-check or consumer-side caching, since each replay multiplies fan-out.
- Average message size. Topics with messages above a few hundred KB are candidates for claim-check, where the payload goes to object storage and only the reference flows through Kafka.
- Cross-AZ ratio. On hosted platforms with cross-AZ billing, audit which topics have most of their consumers in different AZs from the partition leader; partition assignment can sometimes be reshaped to keep traffic in-AZ.
What it costs
The cost compounds across produce, replicate, and consume. A 1 MB batch with replication factor 3 generates:
- 1 MB of ingress
- 2 MB of internal replication networking
- 3 MB of storage (one copy per replica)
- Egress that scales with the consumer count
Compression typically reduces all of these proportionally. Well-configured compression often achieves 5x to 10x reduction on text-heavy payloads, which translates into the same multiplier on the storage and network side of the bill.
| On hosted platforms | On self-managed estates |
|---|---|
|
|
Stories from the field
Egress 30 times higher than ingress on one topic.
A platform team analyzing their estate discovered that one production topic had egress running 30 times higher than ingress. The topic was being consumed by a single application that had grown internally to spawn hundreds of independent consumer instances, each polling at the default rate.
The configuration was technically valid; nothing in the platform's monitoring would have flagged it. The fix was an application-side architectural change to consolidate the consumers. The cost impact was significant in the months between when the duplication started and when the analysis surfaced it.
Pattern
Could pooled capacity replace dedicated allocation?
What the pattern looks like
Each topic, cluster, and connection holding dedicated broker capacity at all times, whether at peak load or sitting idle. Across an estate with hundreds of resources on different load profiles, the sum of those dedicated allocations runs well ahead of true aggregate demand, with no built-in way to pool underlying capacity across them.
Concrete examples of where peaks misalign in real estates:
- Batch jobs. Nightly or monthly loads that run heavily for an hour and sit idle the rest of the time.
- Dev and staging environments. Low-traffic during nights and weekends by nature.
- Time-zone-shifted workloads. Different geographic regions peak on different clocks.
- Seasonal traffic. Retail holidays, financial month-end close, quarterly reporting cycles.
In each case, the resource holds capacity sized for its peak around the clock, even though the peak only happens for a fraction of the time.
Why it happens
Kafka's design allocates capacity per resource. There is no built-in mechanism to share underlying broker capacity across topics or clusters that are not under load simultaneously. A topic provisioned for peak traffic continues to hold that capacity at low traffic, and an underutilized cluster cannot lend its headroom to a busier neighbor.
How to identify it
Plot the load profiles of the top resources by provisioned capacity on the same time axis, then compare the sum of their individual peaks to the aggregate peak across all of them. The gap between those two numbers is the pooled-capacity opportunity. The diagnostic is more involved than the others in this section because it requires correlating load profiles across resources, but the payoff is large in mature estates because peaks rarely align.
Each resource holds capacity sized for its individual peak around the clock. Because peaks rarely align across resources, the aggregate peak across all of them is much lower than the sum of individual peaks. The gap is what pooled capacity recovers.
Non-production environments are the easiest place to start the analysis. By nature they carry low and bursty traffic, so the over-allocation is most visible there and the risk to the business of pooling capacity is low. The same pattern exists in production, but the consolidation work is harder to justify because production teams reasonably optimize for risk over efficiency. Starting with non-prod produces evidence and savings that make the production conversation easier later.
What it costs
This is the pattern that produces the largest absolute savings in big estates, because the aggregate over-allocation grows with resource count. It is also the pattern that requires the most architectural change to address, since pooling capacity across resources is not a configuration setting; it is an infrastructure layer that sits between clients and brokers and has to be introduced deliberately.
| On hosted platforms | On self-managed estates |
|---|---|
|
|
How this cascades
This pattern compounds with the others above:
- Pattern 1 is static capacity at the partition level: an overprovisioned partition holds broker capacity it does not use.
- Pattern 3 is the same problem at the cluster level: an underutilized cluster holds infrastructure capacity it does not use.
- Pattern 4's derived stream-processing topics inherit the static capacity of their source topics, multiplying the over-allocation downstream.
Treating the patterns separately is useful for diagnostics; the underlying constraint is shared, which is why pooled capacity tends to produce the largest savings when an estate is mature enough to take it on.
Putting the patterns to work
How waste compounds across the other layers
The four cost layers introduced earlier are not independent. Infrastructure waste pulls the other three layers up with it, which means the leverage on the diagnostics above is larger than the direct infrastructure bill alone suggests.
Ecosystem tooling: paying for the same work multiple times
Stream processing engines and other ecosystem tools sit on top of the infrastructure and inherit its inefficiencies. The clearest example is filter-only stream processing: when a downstream service needs only part of an upstream topic, the common workaround is a Streams, ksqlDB, or Flink job that reads the source and writes a filtered subset to a new topic.
In the estates we have looked at, this often accounts for the majority of stream processing usage; in some cases, 80 percent of Flink jobs are doing nothing more than pulling a subset of messages from one topic into another. The team ends up paying three times for the same work:
- The stream processing engine. Compute and licensing for Flink, ksqlDB, or Streams instances running the filter.
- Developer time. Building and maintaining the filter job, plus the operational overhead of treating it as a production service.
- The resulting infrastructure. The derived topic and its replicas, plus the partition-replicas it consumes against the per-broker ceiling from Pattern 1.
Where the filter is not part of a larger transformation pipeline, application-side filtering removes all three costs at once. Pattern 4 is where the topic-level mechanics live; the point here is that the same waste also pulls the ecosystem-tooling line item up.
Vendor and licensing: the percentage trap
Vendor and licensing costs ripple from infrastructure in ways that are easy to underestimate. Several common line items are priced as a percentage of, or scoped against, total platform spend rather than as flat fees:
- Support contracts. On most hosted Kafka platforms, billed as a percentage of total platform spend, so an inflated infrastructure bill produces a proportionally inflated support line item without anyone deciding to pay more for support.
- Tier upgrades and feature add-ons. RBAC, audit logs, private networking, and multi-region replication, often gated behind tier upgrades that get triggered by cluster size or capacity-unit thresholds.
- Professional services. Engagements scoped against cluster footprint or capacity units, which scale up alongside the infrastructure they are meant to support.
The corollary is that cleaning up infrastructure automatically reduces the support line item by the same proportion. The percentage stays the same; the base it applies to shrinks. No renegotiation required, no separate conversation, just the natural consequence of the infrastructure cleanup.
For estates carrying premium feature add-ons or multi-region replication on top, the multipliers stack: every dollar of infrastructure waste pulls support, premium features, and any percentage-priced services up with it. The leverage on the cleanup is correspondingly larger.
Operational overhead: the cost that does not show up on the bill
Operational overhead is the layer that does not appear on the vendor bill at all. It surfaces as the time the platform team spends on manual operations that would not be necessary if the estate were better governed.
The specific costs include:
- Manual provisioning. Ticket-based topic creation, partition changes that require platform team involvement for every request.
- Firefighting. Capacity emergencies when disks fill or capacity thresholds hit unexpectedly, requiring urgent intervention that pulls engineers off planned work.
- Cross-team coordination. Reaching out to dozens of project teams to confirm ownership before deleting topics, negotiating retention changes, managing exceptions.
- Knowledge loss. When the engineer who understood why certain decisions were made leaves, the remaining team spends time rediscovering context that should have been captured in tooling or policy.
These compound alongside the infrastructure waste. A platform team spending 40 to 60 percent of its time on reactive operations and manual provisioning is common. That share of senior engineering time is the constraint on how many internal teams the platform can support, how well it supports them, and whether the platform team can do anything beyond keeping the lights on.
Stories from the field
Nine-month lead time on new hardware.
A platform team running Kafka on-premises had a nine-month lead time on new machine provisioning. That number set the tempo for everything the team did. When storage on a cluster started to fill, they scrambled to retire topics, tighten retention, or reshuffle workloads, because waiting for hardware was not a viable answer.
There was no proactive cleanup mechanism; cleanup happened when disks were about to fill. The same dynamic plays out in cloud estates, just with a different trigger: cleanup happens when the bill escalates to a level that draws leadership attention.
Operational overhead is the constraint that limits how many internal teams a small platform team can support effectively. A meaningful share of the platform team's time goes to overhead that doesn't grow the platform's reach. Reducing that share is how a small team scales: more Kafka users, more use cases, and a better developer experience, without growing in proportion.
What to do about each pattern
Capturing the savings requires choosing a response that fits the pattern, the estate, and the team's capacity for coordination. Three response categories cover the work:
- Update defaults. Stop new topics, clusters, and clients from inheriting waste at creation. Low coordination, no immediate savings, slows future growth.
- Optimize existing loads. The hygiene and rightsizing work on what is already running. Weeks to months of effort; typically moves the bill 10 to 20 percent.
- Rethink workloads. Architectural changes for cases where the current shape no longer fits. Largest absolute savings in big estates, longest timelines.
Most patterns can be addressed through more than one approach. The mapping is many-to-many, and the right starting point depends on the estate.
| Pattern | Updating defaults | Optimizing existing loads | Rethinking workloads |
|---|---|---|---|
| Partition overprovisioning | Sensible partition defaults for new topics | Right-size existing topics via recreation | Pool partition capacity across topics |
| Retention not matched to consumer needs | Default retention by topic class | Audit and tune existing topics | Not typically required |
| Cluster sprawl | Default cluster tier for new use cases | Consolidate underutilized clusters | Move to pooled cluster capacity |
| Topic proliferation and duplication | Ownership and cataloguing at creation | Retire orphans, deduplicate | Discoverability layer |
| Inefficient client patterns | Compression enforcement, default fan-out limits | Audit consumer groups, restructure fan-out | Architectural changes to consumer model |
| Static capacity per resource | Not addressable through defaults | Not addressable through optimization | Capacity pooling layer |
Resist the urge to tag each pattern with a single primary response. The right choice depends on where the waste actually concentrates in your estate and how much coordination capacity the team has.
Collect the data
Acting on any of these patterns starts with collecting data the platform does not surface by default. Two kinds of data, in two different shapes.
The first is technical. Where the cost is actually coming from in your estate: partition counts versus actual throughput, egress-to-ingress ratios per topic, cluster utilization at peak load, cost-by-team. None of these surface by default; pulling them together is deliberate work.
The second is organizational. What your team and leadership are actually set up to change: drivers, cleanup history, where guardrails get bypassed, how engineers adapt to constraints today. This data lives in interviews, past-cleanup retrospectives, and conversations with leadership, not in cluster metrics.
Both kinds matter. The technical data tells you which patterns are present. The organizational data tells you which response approaches will actually land. A serious analysis depends on both.
These are the kinds of questions worth working through with the team and with leadership. Not a checklist, just examples of where the conversation tends to go when we run cost analyses with platform teams.
1. What's driving the conversation right now? A renewal, a leadership directive after a budget review, on-prem capacity pressure, or a sustained gap between business throughput and Kafka spend. The forcing function shapes which patterns are the priority and how much organizational support the work will have.
2. Is there any mechanism that ties Kafka costs back to the app teams generating them? Chargeback, showback, or even basic visibility into per-team consumption. Without a feedback loop, app teams have no incentive to right-size their own usage, and every cleanup becomes a centralized push by the platform team against teams that bear none of the cost themselves.
3. What do engineers do when they need more partitions on a topic that already exists? Recreate with full producer-consumer coordination (rare in practice), live with the constraint, or spawn a parallel topic. The actual workaround predicts how much partition overprovisioning is being baked in by individual decisions.
4. What do teams do when they need data from a topic in another cluster? Replicate, build a custom bridge, or duplicate the data into their own cluster. The default path predicts whether topic proliferation will reverse with cleanup or keep regenerating.
5. How are guardrail exceptions managed? Whether partition, retention, or cluster guardrails exist matters less than how exceptions are approved and tracked when strategic projects bypass them. Exceptions that accumulate without review become the largest pockets of waste in most estates.
6. Where did the previous cleanup stall? Cross-team coordination to confirm topic ownership, reluctance to touch live workloads, lack of leadership air cover, or political exceptions for strategic projects. The same blocker will usually hit the next cleanup.
The cost analysis we offer covers both halves equally. We help you figure out exactly what data to pull and how to interpret it, and we work through which response approaches will actually move costs within your specific organizational context.
Want to see how much you could save on your Kafka bill?
Get a free Kafka cost analysis with our field engineering team. We will walk through your estate together, identify the waste patterns that apply, and give you a concrete estimate of where the savings are.