Proactive and Reactive: The Two Paths Towards Data Quality

Proactive and Reactive: The Two Paths Towards Data Quality

Ingesting low-quality data will negatively impact AI and ML outputs—essentially “garbage in, disaster out.” Fix data quality problems at the source—before they travel downstream.

12.06.2025

To paraphrase an old sports saying, AI is only as good as its last input. Low quality data—such as broken or missing schema, inaccurate or incomplete fields, inconsistent formatting, and range value violations—can all result in hallucinations, model drift, and degraded accuracy.

Confluent’s latest data streaming report lays this out in greater detail. After surveying 4,000 tech leaders, 68% of respondents mentioned data quality inconsistencies as their greatest data integration challenge. Interestingly, 67% of respondents cited uncertain data quality, a related problem that likely stems from a lack of visibility and monitoring capabilities. 

The proactive and reactive approaches to data quality

In general, organizations can ensure data quality in two ways. They can take a proactive approach, filtering and stopping inconsistent data from entering their environment in the first place, or they can take a reactive approach, cleaning up errors and discrepancies after ingestion.

These two methodologies require different tools. By working proactively, teams “shift left,” essentially moving upstream, closer to the data source and the beginning of the data lifecycle. This method requires specialized, streaming-compatible tools for schema and policy enforcement and quality gates. 

In contrast, teams that use a reactive strategy to data enforcement will “shift right,” moving downstream and towards mission-critical applications such as AI and analytics. Unfortunately, reactive data quality strategies do have a significant downside: by the time issues (such as broken schema, missing values, or mismatched formats) are detected, the damage is usually already done. Dashboards may display inaccurate metrics, AI models will deliver flawed predictions, and decision makers will lack key context and insights.

To complicate matters, the reactive approach also requires significant efforts after the fact. Engineers not only have to root out this poor quality data, they also have to trace such data’s lineage, identifying where it originated, how it was transformed, and which applications it may have infiltrated. This will likely necessitate investigating logs, audit trails, and even across different teams—a long, error-prone process that pulls engineers from their core duties. 

One example is the 1:10:100 rule, which is essentially a breakdown of the monetary costs involved in fixing data quality errors. For every dollar spent validating data at the source, it will cost 10 dollars to resolve the same error at the transformation stage, and finally, 100 dollars to repair it at the point of consumption. Costs balloon as data quality issues move downstream, necessitating ever more expensive solutions and introducing more risk.

Why AI and ML need to proactively guarantee data quality

In addition, as AI becomes increasingly real-time, covering use cases such as fraud detection, personalized retail experiences, or even autonomous vehicles, inputs must also shift from batch to streaming ingestion. In these situations, proactively guaranteeing data quality becomes even more important, given the potential for compromised customer credentials, lost revenue, and even safety issues. As Conduktor CTO Stéphane Derosiaux explains, “it’s garbage in, disaster out.”

Because of the time-sensitive nature of these applications, organizations can no longer count on post-ingestion cleanups to resolve their data quality issues. Taking a wait-and-see approach increases the severity of consequences, as flawed data pollutes customer-facing experiences, operational systems, and AI pipelines before anyone can detect the issue. Because organizations increasingly rely on real-time insights for competitive advantages, shifting quality enforcement right is too risky.

For enterprise leaders, poor data quality creates business risks, driving up operational costs, clouding decision making, and undermining their organization’s overall success. Untrustworthy data leads to untrustworthy AI, weakening product launches, customer experiences, and compliance postures.

Shifting left with confidence

Clearly, organizations can no longer count on reacting to data problems, and instead must introduce quality enforcement as early as possible—ideally before it affects downstream applications. 

As a streaming-native platform for enforcing data quality, Conduktor Trust enables teams to:

  • Truly shift left and enforce schema and payload rules at the source

  • Prevent bad data from infiltrating downstream applications—and causing costly issues

  • Provide in-stream visibility into data quality across your entire environment

Some streaming data technologies devolve governance and quality enforcement to producers and teams. While decentralization sounds good in theory, given that it restores power to practitioners who are closer to the operations side, in practice it can be fraught with friction, especially if there are already existing tools in use. 

For instance, every team would have to implement the exact same configurations and procedures for data quality validation, and any mistakes can leave environments vulnerable to gaps. Synchronizing these processes across many producers and teams can also be time-consuming and expensive, especially in large environments with many clusters feeding data asynchronously to each other.

Instead, Trust provides a single, central interface from which to proactively monitor and validate data quality. This means you can build and govern once—and then use it anywhere, foregoing many hours of tedious, manual work. 

If you'd like to learn more about Conduktor Trust, book a demo today.

Verpassen Sie das nicht