Why Enterprise AI Projects Stall on Data Access, Not Models

Stéphane Derosiaux December 1, 2025 4 min read

Enterprises are racing to adopt AI, but most engineering teams are fighting a different battle: accessing, moving, and trusting operational data.

This reality came through in our recent panel with leaders from SMBC Group, Marsh McLennan, and Conduktor. The session focused on making valuable data safe for AI, but the conversation quickly shifted to architectural problems that slow AI long before a model is ever trained.

Everyone on the panel agreed: AI isn't the problem. Data movement is.

Four-person video-call panel collage of SMBC Group, Marsh McLennan and Conduktor speakers discussing making data safe for enterprise AI

Data Access Controls Block AI Before It Starts

Ask any architect or data engineer where AI projects stall, and the answer won't be "the model". It will be access.

The panelists described operational data constrained by:

Strict role-based access controls
Data residency rules across regions
Sensitivity classifications nobody fully trusts
Spreadsheets leaking data outside governance
The constant fear of accidental misuse

One quote hit hard:

"Not all data is appropriate for everyone, and even accidental misuse still has consequences."

This is the opposite of the AI hype narrative. Before any innovation can happen, teams are stuck in a loop of approvals, clarifications, and risk reviews.

The blocker isn't ambition. It's architecture.

10+ Hops Before Data Reaches the Lakehouse

Once access is granted, the next challenge emerges: the pipeline itself.

Across all three companies, the number of hops operational data passes through was staggering. Sometimes 10, 11, even 15 hops before landing in a lakehouse.

Every hop introduces risk:

Schema changes that ripple downstream
Different tooling across teams
Inconsistent transformations
Unclear ownership
No shared understanding of semantics
Lineage that exists only in tribal knowledge

Sreeni from Marsh McLennan captured this perfectly:

"By the time the data ends up in your lakehouse, you're not always sure it's the right piece of information anymore."

The pipeline itself has become too fragmented to trust. The industry talks endlessly about model drift, yet the bigger, more dangerous drift is happening inside pipelines that no one has full visibility into.

Governance Applied at the End of the Pipeline Is Too Late

If data access is the first bottleneck and multi-hop fragility is the second, governance is the third and arguably the most consequential.

The panelists shared challenges that rarely make it into conference keynotes:

Spreadsheets exported outside control
Inconsistent masking and tokenization
Sensitive fields blended with non-sensitive ones
Unvalidated data feeding downstream decisions
Classification models that don't match real usage
Audit logs built for after-the-fact forensics, not prevention

"It's better not to provide any data than provide incorrect data that leads to wrong decisions."

Governance applied at the end of the pipeline is too late. Quality, sensitivity detection, controls, and lineage must shift left, closer to the source, before data moves. It's a worldview shift: governance is no longer traffic control. It's architectural foundation.

Data Request Overhead Kills Velocity

The slowest part of data movement isn't the movement. It's everything around it.

A single data request triggers a chain reaction:

Analytics teams define attributes
SMEs decipher which systems contain them
Source teams extract the right tables
Validation teams check consistency
Governance teams classify sensitivity
Engineers assess scaling, contracts, and tools
Transformation teams align semantics

"Most of the time is spent before development, identifying the source, agreeing on transformations, validating quality."

Ask any architect why projects take months, and they'll say the same thing: It's not the code. It's the conversations, the negotiations, the risk tradeoffs.

Until enterprises fix this, AI will always move slower than promised.

Stale Data Undermines AI Predictions

In risk scoring, fraud detection, market trends, or credit exposure, delay becomes a liability.

As Shuchi noted:

"A lot of insights are time-sensitive. If they arrive late, the value is gone."

AI amplifies this. Models trained on inconsistent data can drift into dangerous territory: bias, wrong decisions, broken customer experiences, regulatory violations. Real-time is the new normal, and most pipelines aren't ready for it.

Five Principles for Intelligent Data Movement

The panel discussed where modern architectures are heading:

Understand data at the source

Detect sensitive fields, semantics, and schema early.

Strengthen contracts

Changes upstream should never silently break downstream logic.

Shift governance left

Quality checks, lineage capture, and controls must happen before movement.

Build context-aware pipelines

Not diagrams. Actual end-to-end lineage with ownership and business meaning.

Automate pre-work

Profiling, validation, classification, and consistency checks should be built in, not bolted on.

Collectively, the panel was describing the same need: an intelligent data movement layer that understands risk, quality, and governance before a single record flows. That's where the industry is heading, and where the biggest innovation is happening.

Watch the Full Conversation

This blog captures only part of the discussion. The panel shared far more, including stories about schema drift disasters, data residency dead-ends, lineage headaches, and the internal negotiation required to move data across global organizations.

If you want to hear how three leaders are rethinking data movement inside complex, regulated enterprises:

Watch how leaders are rethinking data movement.