The Future of Data Streaming: Current 2023 & Big Data London

The Future of Data Streaming: Current 2023 & Big Data London

Insights and learnings from the forefront of data streaming

James White

Nov 16, 2023

Context

’Twas that time of year once again! The annual pilgrimage across the pond to the largest Kafka conference in North America (Current ‘23). With conference season in full swing, Conduktor also took the opportunity in September to exhibit Big Data London and speak at the new ‘Fast Data’ theatre.

Armed with Conduktor themed swag, we couldn’t wait to hit the floor and indulge in conversation with companies that were present to discuss their data streaming initiatives (or woes!). We were ready to engage and excited to measure the reception to our latest product developments.

You can check both our talks from the events below, and find feedback and learnings from the field just beneath that.

Talks

From POC to Mission Critical: How to build a fast data tooling strategy

In this talk, James & Stephane explore both technical and organizational challenges you will wrestle with as your data streaming maturity grows and how to equip yourself to handle upcoming problems.

Fast data enables organizations to extract greater value from the data flowing through their business. It’s now pivotal to remaining competitive in all major industries. Despite this, many organizations fall into the trap of scaling fast data adoption without an underlying tooling strategy. Built on the foundations of a POC that eventually lands in production, few think longer-term about the challenges you will meet along the way. In this talk, we’ll explore both technical and organizational challenges you will wrestle with as your fast data maturity grows to equip yourself to handle upcoming problems.

From the battlefield: Squeezing the most from your fast data infrastructure

In our second talk, Stuart explores the problems you’ll experience from your infrastructure expanding and many clever solutions to mitigate them.

n the early stages, building your Fast Data strategy is easy and cheap: its scope is small, and things are moving fast. Engineers spend more time building pipelines, teams spin up extravagant resources and you find yourself and others diverting routinely diverting their attention from your core business. In this talk, we’ll explore the problems you’ll experience from your infrastructure expanding and many clever solutions to mitigate them.

Kafka Proxy, you say?

That’s right, it was our first chance to gauge in-person, North American market reception to Conduktor Gateway, our extensible Kafka wire proxy. We had enough interest in this subject that it gave us plenty of opportunity to sharpen our elevator pitch:

Conduktor Gateway is a Kafka proxy that sits between your client applications and existing Kafka cluster(s). It speaks the Apache Kafka protocol, so can be dropped into your existing infrastructure and your applications need only point to a new bootstrap server.

The next logical question of any curious punter was to ask **one of why? **Either that, or they grasped the value of proxy architecture immediately and swiftly moved on to questions of performance and scaling!

To anyone new to this concept, the question of why comes back to problems that arise in organizations that adopt Kafka as a middleware. Kafka doesn’t ship with all of the controls you might expect from Enterprise technology; particularly if you drill into areas like security and governance.

To name a few examples:

  • Kafka doesn’t provide an out-of-the-box mechanism for end-to-end encryption of topic data. The alternative, an encryption strategy that’s as good as the least informed client.

  • Kafka comes with a multitude of configuration options. This makes it difficult to defend against a rogue producer that causes havoc for others.

  • As adoption of Kafka grows in an organization, so too does the challenge of auditing which applications are doing what.

Having discussed the various directions you could take a Kafka proxy, it was end-to-end encryption that consistently resonated with the audience. We’re living in a world of data streaming and data exchange; handling sensitive and PII data is part of the game.

Kafka Costs Rising

Cloud spending was a constant theme at both events. This is likely reflective of macro factors such as inflation and geopolitical uncertainty. We spoke to numerous companies that had 3x’d their manage Kafka spend within just 2 years. On one side, this is testament to growth and adoption of data streaming internally, but it also promotes the need for a cost mitigation strategy.

As such, Conduktor’s ability to act as a virtual layer was a welcome solution to simplifying the cost explosion. In a paradigm akin to VMWare and virtualized machines, Conduktor brings virtualization in front of managed Kafka services, ensuring you squeeze the most out of what you already pay for.

This vendor-agnostic layer allows you to increase utilization per cluster, while it can also be used as a mechanism for data isolation (i.e. multi-tenancy, with tenant-specific safeguarding rules). This paradigm can be taken even further; virtualizing ‘topics’ via simple SQL statements to provide projections of the underlying data. The advantage, of course, not having to deploy and maintain additional infra or redundant resources for a simple transformation.

Because Conduktor Gateway is Kafka protocol compliant, whether it’s a virtual cluster or a virtual topic, the resource functions just like the real thing. The result is an abstraction that can be invisible to the application teams that depend on them.

Data Exchange

Confluent heavily focused on ‘data products’ throughout the keynote. It was a gateway to showcasing their new data portal, which focuses on democratizing data streams within an organization. It ties together data mesh principles such as data discovery, shareable business metadata, lineage and domain-driven ownership.

There is no doubt this is a valuable product for Confluent customers. Absolutely, it can help solve challenges around internal data exchange between different teams and business units.

At the same time, there was little emphasis on sharing data externally. For example, sharing real-time data with third-party partners. Our take-away from the two events was this is a problem still relevant in most Enterprise organisations. The result is often painful ETL processes or expensive replication processes.

Snowflake have famously championed their marketplace for sharing warehouse data externally. Conduktor are now enabling real-time sharing of Kafka data without building replication processes that bring a cost and operational overhead. Much easier to share a virtual view that’s derived from the source-of-truth, rather than recreate it, right?

There was a dedicated ‘Apache Flink Forest’ at Current this year. Confluent are betting big on widescale adoption of Flink for stream processing. Not just for the operational data estate but also the analytical data estate, acknowledging the criticality of data to now flow between the two.

Confluent’s new offering will be the first truly cloud-native, serverless Flink service. This means automatic upgrades, auto-scaling and pay for what you use.Though it’s too early to determine if Flink will pick up where ksqlDB has (arguably) failed, we will certainly be watching developments from the sideline.