Kafka Security: Access Control That Scales

Kafka security goes beyond TLS encryption. Scalable access control, ACL management, and provable permissions for production environments.

Stéphane DerosiauxStéphane Derosiaux · October 21, 2025 ·
Kafka Security: Access Control That Scales

Enabling TLS doesn't make your Kafka deployment secure. It makes it encrypted.

Security is knowing who can read which topics, why they have that access, and proving both on demand. Most teams enable SASL authentication and TLS encryption, declare victory, and move on. Then an auditor asks "who can access customer PII?" and the answer requires three days of ACL archaeology because nobody actually knows what the 847 ACLs in production allow.

Real Kafka security has three layers: authentication (who are you?), authorization (what can you access?), and encryption (is data protected in transit and at rest?). Most teams get authentication and encryption right. Authorization is where things fall apart.

The problem isn't that teams don't use ACLs. It's that ACLs grow organically, nobody documents why they exist, and ACL sprawl means after six months, explaining what they allow becomes an archaeological dig through ticket systems and Slack logs.

The ACL Sprawl Problem

ACL sprawl happens gradually. It starts with reasonable requests: the orders-service needs to read the orders topic. You grant an ACL allowing the orders-service principal to read the orders topic. One week later, the same service needs to read the inventory topic. Another ACL. Then it needs to write to the orders-processed topic. Another ACL.

Six months later, you have hundreds of ACLs. Some were granted for testing and never revoked. Some grant access "just in case" for services that never actually consume the data. Some use wildcard patterns like service-* that match more resources than intended.

The security issue isn't the ACLs themselves—it's that nobody can explain what they collectively allow. When a team asks "who can access this topic containing customer emails?" the honest answer is often "we'd have to check." That's not security. That's hoping nothing goes wrong.

ACL sprawl has three causes: lack of cleanup, overly broad grants, and no ownership mapping.

Lack of cleanup happens because revocation requires knowing what's no longer needed. When a service is decommissioned, its service account should lose access. But if nobody documents which service accounts map to which applications, cleanup doesn't happen. Permissions accumulate.

Overly broad grants happen because narrowly scoped permissions require more work. It's easier to grant service-x read access to topic-* (all topics) than to list the three specific topics it needs. The security impact isn't obvious initially, but when that wildcard expands from 10 to 100 topics, the service suddenly has access to data it shouldn't touch.

No ownership mapping means you can't answer "who owns this topic?" If ownership is unclear, access reviews are impossible. You can't determine whether a service account should have access if you don't know who's responsible for approving it.

Principle of Least Privilege Applied to Streaming Data

Least privilege means granting the minimum access necessary for a service to function. For Kafka, this translates to specific guidance:

Grant access to specific topics, not wildcards. If a service needs three topics, grant access to those three topics explicitly. Avoid patterns like , team., or service-* unless the service genuinely needs access to all matching resources.

Grant read-only access unless writes are required. Many services only consume data. They shouldn't have write permissions "just in case." Read-only access limits the blast radius if credentials are compromised.

Use service accounts per application, not shared credentials. When five services share one service account, you can't determine which service accessed which topic. Separate service accounts mean access patterns are traceable to specific applications.

Expire temporary access automatically. Access granted for testing or migration should have expiration dates. Tools like Kafka Security Manager can enforce this through policy, automatically revoking permissions when the expiration date passes.

Best practices emphasize configuring ACLs with the principle of least privilege: when multiple ACLs apply to the same principal and resource, DENY rules override ALLOW rules, so careful design prevents accidental over-permissioning.

Authentication Patterns: SASL vs. mTLS

Authentication proves identity. Kafka supports multiple mechanisms: SASL/PLAIN, SASL/SCRAM, SASL/GSSAPI (Kerberos), and mTLS (mutual TLS).

SASL/SCRAM is the most common for production deployments. Credentials are stored in ZooKeeper or KRaft, and clients authenticate using username/password hashed with SCRAM-SHA-256 or SCRAM-SHA-512. The advantage is operational simplicity: adding a user doesn't require certificate management. The disadvantage is credential rotation requires client restarts.

mTLS provides stronger authentication through client certificates. Each client presents a certificate signed by a trusted CA, and the broker validates it. This is operationally heavier—certificates have to be generated, distributed, and rotated—but provides non-repudiation (the certificate proves who connected).

The choice depends on operational constraints. SASL/SCRAM is simpler for teams without PKI infrastructure. mTLS is stronger for environments with mature certificate management.

Regardless of mechanism, the principle is the same: every connection must be authenticated. Anonymous access is a vulnerability, not a convenience.

Encryption: Transit, Rest, and End-to-End

Encryption protects data at different layers. Understanding which layer matters for your threat model prevents over-engineering or under-securing.

Encryption in transit protects data as it moves between clients and brokers. TLS encrypts connections, preventing network sniffing. This is table stakes for any production deployment and solves the problem of network-level eavesdropping.

Encryption at rest protects data stored on broker disks. If an attacker gains access to a broker's storage (stolen disk, misconfigured cloud permissions), encrypted disks make the data unreadable without keys. Cloud providers offer disk-level encryption (AWS EBS encryption, GCP disk encryption). This solves the problem of physical media compromise.

End-to-end encryption protects message payloads from producer to consumer. Data is encrypted before being sent to Kafka and decrypted after being read from Kafka. This means even broker administrators can't read message contents. This solves the problem of untrusted infrastructure but adds operational complexity (key management, application-level encryption logic).

Most deployments need encryption in transit (TLS). Regulated environments need encryption at rest. High-sensitivity environments (financial data, healthcare records) need end-to-end encryption.

Access Control Best Practices

Security at scale requires moving from individual ACLs to application-based permissions.

Instead of thinking "grant service-account-x read access to topic-y," think "application-orders owns topics matching orders.* and its service account gets appropriate permissions automatically."

This shifts from managing hundreds of individual ACLs to defining ownership patterns. Applications declare what they own, and ACLs are generated based on ownership. When a new topic matching the pattern is created, ACLs are applied automatically.

Approval workflows layer on top. When an application needs to read data owned by another team, it requests access. The owning team reviews and approves. The approval generates ACLs automatically, with audit trails recording who requested access, who approved it, and when.

This model scales because:

  • Permissions map to business logic (which team owns which data)
  • ACLs are generated, not handcrafted
  • Access requests have approval workflows and audit trails
  • Ownership is documented through patterns, not spreadsheets

Operational Security: Credential Rotation and Audit Logs

Security isn't static. Credentials need rotation, access needs review, and activity needs logging.

Credential rotation prevents long-lived credentials from becoming vulnerabilities. Service account passwords should rotate quarterly or monthly. TLS certificates should rotate before expiration. Rotation procedures should be tested regularly, not discovered during an outage when a certificate expires unexpectedly.

Access reviews validate that permissions still match requirements. Quarterly reviews should ask: does this service account still need these permissions? Is the service still active? Has the owning team changed? Access reviews catch permissions that should have been revoked but weren't.

Audit logs record who accessed what and when. Kafka audit logs (not application logs, but access logs at the broker level) show which principals read from which topics. These logs answer security questions: did service-x access customer data? When did unusual access patterns start? Who modified ACLs last week?

Audit logs are useless if they can't be queried efficiently. Storing logs in object storage is good for compliance but bad for investigation. Logs should be searchable: "all access to topics containing PII by principals not in the data-platform team" should be a query, not a grep job.

Enforcing Security Policies

Security policies that aren't enforced might as well not exist. Documenting "service accounts should use strong passwords" doesn't prevent weak passwords. Policy enforcement does.

For Kafka, enforcement happens at multiple layers:

Broker-level enforcement: Brokers reject unauthenticated connections, deny unauthorized operations, and require TLS for sensitive topics. This is Kafka's native security model.

Control plane enforcement: Topic creation policies can require encryption settings, reject weak configurations, and enforce ownership requirements. A topic containing PII can't be created without encryption enabled and an assigned owner.

Application-level enforcement: Producers and consumers can enforce schema validation, data masking, and access logging beyond what Kafka provides natively.

The most effective enforcement is automated and immediate. Invalid requests should fail at creation time with clear error messages, not silently create insecure resources that get flagged later during audit.

Measuring Security Posture

Security isn't binary. Track three metrics: permission scope, credential age, and audit response time.

Permission scope measures how many ACLs use wildcards versus explicit grants. Lower is better. If 60% of your ACLs use wildcards, permissions are probably too broad.

Credential age measures how long service account credentials and certificates have existed without rotation. Credentials older than 90 days should be flagged for rotation.

Audit response time measures how quickly you can answer "who accessed topic X?" If this takes days, your audit logs aren't good enough. Target under one hour for any security question.

The Path Forward

Kafka security isn't about enabling features. It's about knowing who can access what, ensuring only authorized access happens, and proving both on demand.

Conduktor provides ACL visualization, application-based permissions, approval workflows for cross-team access, and automated audit trails across all Kafka clusters. Instead of managing hundreds of individual ACLs, teams define ownership patterns and let the platform generate permissions automatically. Access requests flow through approval workflows with full audit trails. Security questions get answered through reports, not manual investigation.

If explaining your Kafka permissions requires archaeology, the problem isn't your documentation—it's your security model.


Related: Kafka Security Best Practices → · Kafka Encryption → · Enterprise Kafka Security →