Produits

Solutions

Resources

Developers

Compagnie

Prix

Get started

Book a Demo

Kafka GitOps: Preparing Your Deployment to Scale Efficiently

Complex infrastructure calls for automation. See how GitOps streamlines Kafka configuration and resource management, and scale your deployments efficiently.

James White

4 juil. 2024

Kafka GitOps: Preparing Your Deployment to Scale Efficiently

Managing complex infrastructures for distributed systems like Apache Kafka requires more than manual intervention; it calls for a scalable and automated approach. Kafka GitOps and infrastructure as code (IaC) can no longer be an afterthought.

According to Confluent’s 2023 Data Streaming Report, 74% of IT leaders cite inconsistent use of integration methods and standards as a significant hurdle to advancing data streaming.

As deployments grow in size and complexity, it becomes more difficult for platform teams to keep track of configurations, updates, and dependencies. At the same time, identifying ownership, discovering existing resources, and cross-team data sharing are also problematic for development teams.

GitOps and IaC are foundational practices in the DevOps and cloud-native spaces that can also bring consistency, standardization, and business agility to Kafka users.

Read on to discover in detail how implementing these approaches can help prepare your Kafka deployment for scaling.

TL;DR

Manual configuration often can't keep up with changes in scaling Kafka deployments.
Kafka GitOps is an approach that allows automating resource and configuration management in Apache Kafka.
Kafka client configurations are a minefield, but you can reduce the danger with appropriate policies and tools that let you implement them at scale.
Conduktor can help you create a framework for consistency, standardization, and business agility when preparing to scale.

Zooming in on Kafka GitOps

In the context of Kafka, GitOps relates to: Deployment automation Resource configuration Access provisioning Kafka client configurations.

Automating these processes empowers developers to implement changes faster, introduces governance best practices, and alleviates the burden on platform teams responsible for Kafka operations.

Why manual Kafka configuration management isn’t enough

At first, a limited number of Kafka projects with focused scopes is usually manageable for ops and platform teams.

Such teams typically handle resource requests, like topic creation, configuration, partition modification, schema registration, and access requests using Jira tickets and manual resource provisioning.

However, as adoption increases, the number of requests increases, and Kafka’s infrastructure complexity widens. At that point, the team needs to grow to support it, but manual methods quickly become tedious and inefficient.

Ad hoc changes also increase the risk of human error, inconsistencies, and misconfigurations. For example, a simple typo in an access-control list (ACL) entry can easily lead to failure or unauthorized consumers.

The manual process lacks version control, traceability, and transparency—that’s where automation can help you step up your game.

Kafka GitOps for automated resource management

Imagine handling over 100 Transport Layer Security (TLS) certificates, 3,500 Avro schemas, 1,000 topics and 5,000 ACLs.

It would be impossible to pull it off manually without risking a mess of topic names, over-provisioned partitions, and no uniform strategy for managing broker, producer/consumer, and security configurations.

If you want to scale adoption beyond a critical mass of teams, resources, and projects, you need a Kafka GitOps process.

A GitOps configuration-as-code approach lets you automate the management of topics and ACLs using configuration files that define resources and access, similarly to how Terraform provisions infrastructure.

Why you may need to go beyond IaC when scaling Kafka

While handling parts of declarative configuration in Kafka with tools like Terraform is manageable, as you scale up, you’ll quickly find you need a more specialized solution.

This challenge quickly becomes evident in issues like topic management and even creating clusters in production.

Choosing the most optimal tool is a topic (pun intended!) for another time, so let’s focus on Conduktor — a comprehensive way to streamline your Kafka GitOps.

GitOps for Kafka configuration management

Topics, schema subjects, connect configurations, and security configurations can be numerous and diverse. You need a way to manage them all efficiently and without errors, so that the growing multitude of configurations doesn’t degrade your performance and reliability.

GitOps lets you store Kafka configurations in repositories as YAML or JSON files. The example below shows a Kafka topic represented as YAML for use with Conduktor:

---
apiVersion: kafka/v2
kind: Topic
metadata:
 cluster: lzp-prod
 name: click.event-stream.avro
spec:
 replicationFactor: 3
 partitions: 3
 configs:
   min.insync.replicas: '2'
   cleanup.policy: delete
   retention.ms: '60000'

Storing resource configurations as code lets you manage changes to the desired state of your Kafka infrastructure through Git pull requests. This practice relies on three fundamental principles: review, approval, and audit trails.

It results in a more globally transparent approach with self-documenting artifacts and a collective understanding of the Kafka infrastructure setup. This encourages communication and knowledge sharing between teams.

In Kafka, GitOps keeps costly configurations in check

But what good is an IaC approach without control over the configurations? To avoid a Wild West scenario, you'll need automated policy enforcement in your CI/CD pipeline.

As a platform administrator, you’ll want to globally restrict expensive Kafka configs such as:

Replication factor of 3 to ensure high availability and fault tolerance.
Max partitions of 10 to prevent excessive resource consumption.
Max retention of 1 day to limit storage costs.
Topic naming that follows internal standards for semantic clarity.

Here’s an example of a Kafka topic policy that implements the above in Conduktor:

---
apiVersion: "v1"
kind: "TopicPolicy"
metadata:
 name: "prod-topic-policy"
spec:
 policies:
   spec.replicationFactor:
     constraint: OneOf
     values: ["3"]
   spec.partitions:
     constraint: Range
     max: 10
     min: 3
   spec.configs.retention.ms:
     constraint: Range
     max: 86400000
     min: 60000
   metadata.name:
     constraint: Match
     pattern: ^click\.(?<event>

The beauty of a resource configuration policy is that you can store any of its variations as IaC, enabling its orchestration as checks within CI/CD workflows.

As a result, you can introduce automated provisioning of Kafka resources without the risks associated with complex configurations.

By design, it adds a comprehensive Kafka governance layer, minimizing deployment errors, reducing manual effort, and introducing a safe framework for scaling safely.

Should you centralize or decentralize Kafka configurations?

IaC enables platform teams to push responsibility out to domain owners for managing their Kafka configurations. This capability helps remove some dependencies on platform teams, who rarely have the relevant business context behind specific Kafka configurations.

By empowering domain owners, platform teams can focus on providing the tools, frameworks, and workflows to support IaC implementation. This shift in responsibilities fosters a culture of accountability and ownership, which is fundamental for scaling. If it takes more than a few hours to create a topic, you’re likely experiencing a bottleneck.

An IaC approach raises questions about how a company should structure its Kafka operations:

Centralized approach: One repository to store Kafka configurations for the whole company.
Decentralized approach: Multiple repositories for each team or domain.

The right choice depends on how your company is structured and the desired trade-off between agility and governance requirements. Consider the following pros and cons when you are making your choice:

Perhaps the best solution is to consider a hybrid approach, where less critical configurations are managed by teams, and the most critical ones are managed centrally for consistency and compliance.

Combining resource policies with CI/CD isn’t a silver bullet

In the Kafka ecosystem, you have to look beyond resource configurations to understand where additional challenges and complexities exist. Streaming applications connect directly to Kafka, and their behavior and configuration typically don’t follow GitOps principles.

There are over 100 client configuration settings in Kafka. You can use your Kafka client's default settings, and while this choice may not seem like a problem at first, you’ll need to factor in the impact of those defaults on your network, disk, quality of service, and costs as you scale.

Below is an example of a Kafka client configuration. You can consult the handy Kafka Options Explorer for a complete list.

This shows a configuration with the default Kafka client settings for batch.size , linger.ms, and a custom setting for compression.type (to enhance performance), as well as acks (set to all to for maximum reliability).

Kafka client configurations: getting out of the danger zone

Kafka client configurations are a minefield. It's a tall order for developers to keep track of the intricacies and consequences of each setting.

One poorly informed configuration can severely impact your entire Kafka platform and the underlying applications.

Luckily, with a sprinkle of automation, you can avoid mistakes with client configurations:

{
  "config" : {
    "acks" : {
      "value" : [ 0, 1 ],
      "action" : "BLOCK"
    },
    "compressions" : {
      "value" : [ "NONE" ],
      "action" : "BLOCK"
    },
    "recordHeaderRequired": {
      "value": true,
      "action": "BLOCK"
    },
  }
}

This is a policy on client configurations to enforce using:

acks (acknowledgements) blocking 0 or 1, so we always have -1 for the highest level of durability and reliability
A compression format must be used, to conserve storage and reduce network bandwidth
A record header for message routing and filtering is required

Such policies let teams operate autonomously but maintain overarching control of producer/consumer settings at a global level, making them an essential part of Kafka GitOps.

Enforcing Kafka client configuration best practices at scale

As we previously highlighted, GitOps traditionally doesn’t govern configurations of applications that directly connect to Kafka. You can, however, solve this issue architecturally with a proxy.

Conduktor Gateway is an intermediary proxy for handling Kafka requests before forwarding them to the broker. This could involve evaluating the request against client configuration policies and manipulating it, for example, by adding field-level encryption before sending it on.

Inspired by GitOps principles, our proxy centralizes configuration validation, bringing consistency and compliance across clients without forcing you to change each client application.

As a result, you greatly reduce the risk of one misinformed client causing problems for others and the maintenance overhead of keeping client libraries up to date.

Conduktor’s proxy is a practical way to introduce GitOps to your Kafka deployment. It’s stateless, so you can scale it horizontally by adding more instances and distributing the traffic with a load balancer.

These demos can get you started with Conduktor Gateway; you can also explore the documentation to discover some more scenarios.

Boost your consistency and agility with Kafka GitOps

As Kafka adoption scales, new applications bring new requirements, and teams face even more configuration and management complexities.

Following a GitOps approach to managing Kafka resources, client configurations, and rules increases automation, traceability, and collaboration in your development.

If you'd like to scale your Kafka deployment and need a framework for achieving consistency, standardization, and streamlined collaboration, book a Conduktor demo.

You can also join the community — our team will happily chat about how Conduktor can help you adopt Kafka GitOps at scale.

Title

Don't miss these

17 avr. 2025

AI Expectations, Realities, and Risks

As AI adoption accelerates, enterprise leaders grapple with questions about ROI, data security, and reliability. Learn more about these challenges—in their own words.

Paul Feldman

15 avr. 2025

Own the Stream, Power the Model: Monetizing Real-Time Data in the AI Era

Real-time, trusted data is the new AI advantage. Discover why the future belongs to companies that treat data as a product, not a by-product.

Stéphane Derosiaux

15 avr. 2025

Five Hidden Kafka Challenges for Enterprises

Deploying Apache Kafka at scale brings unexpected complications, especially at enterprise scale. Learn what obstacles large organizations face—and how they overcome them.

James White

17 avr. 2025

AI Expectations, Realities, and Risks

As AI adoption accelerates, enterprise leaders grapple with questions about ROI, data security, and reliability. Learn more about these challenges—in their own words.

Paul Feldman

15 avr. 2025

Own the Stream, Power the Model: Monetizing Real-Time Data in the AI Era

Real-time, trusted data is the new AI advantage. Discover why the future belongs to companies that treat data as a product, not a by-product.

Stéphane Derosiaux

17 avr. 2025

AI Expectations, Realities, and Risks

As AI adoption accelerates, enterprise leaders grapple with questions about ROI, data security, and reliability. Learn more about these challenges—in their own words.