Kafka Data Sharing Across Teams

Cross-team Kafka data sharing should be as simple as an API call. Discovery catalogs and approval workflows replace Slack-driven access requests.

Stéphane Derosiaux · December 2, 2025 ·

Cross-team data sharing shouldn't require a project.

When the analytics team needs order data owned by the platform team, the typical process is: post in Slack asking who owns the topic, schedule a meeting to explain the use case, file a ticket for ACL changes, wait days for approval, manually verify the ACLs were created correctly, and finally start consuming.

This takes a week for something that should take an hour. The friction isn't technical—Kafka can handle thousands of consumers without performance degradation. The friction is process: discovering what exists, requesting access, waiting for approval, and configuring consumers.

Real data sharing means: teams can discover what data exists without asking, request access through self-service workflows, get approval from data owners (not platform teams), and start consuming within hours. The platform handles ACL generation, audit trails, and permission tracking automatically.

The Discovery Problem

You can't consume data you don't know exists.

Topic sprawl without visibility creates duplication. Team A builds customer enrichment pipeline reading from customer-events. Team B, unaware this exists, builds customer-data-v2 topic with similar data. Now two pipelines process the same information differently, creating inconsistencies and wasting engineering effort.

This happens because discovery relies on tribal knowledge. Engineers ask in Slack: "Is there a topic with customer data?" Someone who's been around a year remembers customer-events. New engineers don't know what to ask for.

Topic catalogs make discovery self-service. Search for "customer," find all topics containing customer data. Filter by owner (show me topics owned by the platform team), schema type (Avro vs JSON), or consumer count (most-used topics).

Each catalog entry shows:

Topic name and description
Owning team
Schema version and fields
Sample messages
Current consumer count
SLA commitments (if the topic is a data product)

Discovery prevents duplication: before creating a new topic, search whether existing topics serve the need. If customer-events already exists with the data you need, consume it instead of rebuilding.

Access Request Workflows

Discovery solves "what exists?" Access workflows solve "how do I get permission?"

Self-service requests eliminate Slack threads. Instead of asking "who owns customer-events and can I have access?", the engineer clicks "Request Access" in the topic catalog.

Request form captures:

Which consumer/application needs access
Business justification ("need customer data for churn prediction model")
Required fields (full data vs. subset)
Temporary vs. permanent access

Automatic routing sends requests to data owners. The platform knows the platform-team owns customer-events (through application definitions), so requests route there automatically. No manual lookup of "who owns this?"

Owner review happens asynchronously. Platform team receives notification: "Analytics team requested read access to customer-events." They review business justification, decide whether to approve, and specify conditions (full access vs. masked PII fields).

Approval execution is automatic. When platform team approves, ACLs generate automatically. Analytics team gets confirmation: "Access granted. You can now consume from customer-events." No manual ACL commands, no waiting for platform team to execute changes.

Audit trail logs everything: who requested, when, business justification, who approved, when approval happened. Compliance teams can report on data sharing without manual investigation.

This workflow takes hours, not days. Most importantly, it doesn't block platform teams on routine access requests.

For teams managing infrastructure as code, data sharing integrates with GitOps workflows.

Access declared in YAML using Conduktor resource definitions:

apiVersion: self-serve/v1
kind: ApplicationInstancePermission
metadata:
  application: "analytics-app"
  appInstance: "analytics-app-prod"
  name: "analytics-reads-customer-events"
spec:
  resource:
    type: TOPIC
    name: "customer-events"
    patternType: LITERAL
  serviceAccountPermission: READ
  grantedTo: "platform-app-prod"

Pull request for review: Analytics team commits permission request to Git, opens PR. Platform team (configured as code owners for customer-events) receives PR notification.

Approval through PR merge: Platform team reviews PR, approves by merging. CI/CD applies the resource via conduktor apply, generating ACLs automatically.

Benefits: Version control (every access grant is a Git commit), code review (access requests reviewed like code), and rollback capability (revert the commit to revoke access).

GitOps works best for teams already managing Kafka resources (topics, schemas, configs) in Git. It provides consistency: all Kafka changes go through the same workflow.

Small organizations (5-10 teams) can coordinate data sharing through informal communication. Large organizations (50+ teams) need structured processes.

Ownership patterns define which teams own which data. Topics matching orders. are owned by orders-team, inventory. by inventory-team, analytics.* by analytics-team.

Ownership enables:

Automatic routing (access requests go to the right team)
Accountability (each team is responsible for their data products)
Self-service (teams manage their own approval workflows)

Usage analytics show how data is consumed. For each topic, track:

Which teams consume it (10 teams read customer-events)
What's their lag (are consumers keeping up?)
When they started consuming (last month vs. two years ago)

This informs product decisions. If 10 teams consume a topic, schema changes require broad coordination. If zero teams consume it, the topic might be deprecated.

Consumer feedback loops capture quality issues. When consumers encounter problems (late data, malformed messages, unexpected nulls), they file issues with the owning team. The platform routes issues to owners automatically, ensuring feedback reaches the right people.

Multi-Tenancy and Isolation

Organizations with strict data isolation requirements (financial services, healthcare) need multi-tenancy: teams can share infrastructure without accessing each other's data.

Namespace-based isolation groups topics by team. Team A's topics live in team-a. namespace, Team B's in team-b.. ACLs prevent cross-namespace access unless explicitly granted.

This provides logical isolation without separate clusters. Platform teams manage one cluster; tenant teams operate independently within their namespaces.

Proxy-based isolation uses Kafka proxies (like Conduktor Gateway) to enforce tenant boundaries at the network layer. Teams connect to proxy endpoints scoped to their tenancy. The proxy routes to the appropriate topics and enforces isolation.

Benefits: centralized infrastructure, logical isolation, simplified operations (one cluster to manage vs. dozens of tenant-specific clusters).

Sharing data with external organizations (vendors, partners, customers) requires additional controls.

External access patterns differ from internal access:

Temporary access (partner needs data for 3-month project)
Rate limiting (prevent partner from overwhelming cluster)
Data masking (partner sees anonymized data, not raw PII)
Network isolation (partner connects through dedicated endpoint)

API-based data sharing abstracts Kafka complexity. Instead of giving partners direct Kafka access, expose data through REST APIs or GraphQL. Partners make HTTP requests, backend consumes from Kafka and serves responses.

This simplifies partner integration (HTTP is universal, Kafka clients aren't) and provides control (rate limiting, authentication, and logging at API layer).

Change data capture (CDC) for sharing uses Kafka Connect to replicate data to partner systems. Partner gets database replica updated in real-time from Kafka topics, without managing Kafka consumers.

Track time to access, usage diversity, and duplication rate.

Time to access measures lead time from request to consuming. Manual processes measure this in days. Automated workflows measure it in hours.

Target: Under 4 hours for standard requests. Cross-team data sharing shouldn't require days of waiting.

Usage diversity measures how many teams consume shared data. If customer-events is consumed by 15 teams across 5 departments, data sharing is working—teams reuse instead of rebuild.

Low usage (2-3 consumers per topic) might indicate discovery problems: teams don't know data exists, so they build their own.

Duplication rate measures how many topics serve similar purposes. If three teams created customer-data, customer-v2, and customer-enriched because they didn't know about each other, duplication indicates sharing failure.

Target: Under 5% duplication rate. Most data should be shared, not replicated across teams.

Data sharing requires balancing accessibility with security.

Least privilege for consumers means granting minimal necessary access. If analytics needs customer email for reporting but not credit card numbers, grant access with PII masking enabled.

Proxy-based solutions (Conduktor Gateway) enable field-level filtering: consumers get topics with sensitive fields masked or removed transparently.

Data classification labels topics by sensitivity (public, internal, confidential, PII). Access workflows enforce different approval requirements based on classification:

Public: self-service approval
Internal: manager approval
Confidential: data owner approval
PII: data owner + security team approval

Access expiration ensures temporary access doesn't become permanent. If a team needs data for a 3-month project, access expires automatically after 3 months. Owners can renew if needed, but default is expiration.

Audit logs track all data sharing: who accessed what, when, and why. Compliance teams generate reports showing: all access to PII topics in the last quarter, all external partner access, all revoked permissions.

The Path Forward

Kafka data sharing should be frictionless: discover what exists through catalogs, request access through self-service workflows, get approval within hours, and start consuming with automatically generated ACLs.

Conduktor provides topic catalogs for discovery, approval workflows for access requests, application-based ownership for routing, and audit trails for compliance. Organizations report 75% fewer provisioning tickets and faster cross-team collaboration through self-service data sharing.

If cross-team data sharing requires Slack threads and week-long approval processes, the problem isn't Kafka—it's the lack of structure around discovery and access.