The Data Layer Is the Moat

AI doesn't work without a foundation. Infrastructure first, models second.

February 1, 2026·6 min read·Paritosh Kulkarni

Everyone wants to add AI to manufacturing. Almost no one has the data infrastructure to support it.

This is not a criticism of the teams. It's a structural problem. Manufacturing data was never designed to be queried by a language model. It was designed to be read by a specific version of a specific application, usually one that hasn't been updated since 2009.

What makes manufacturing data hard

Three properties make factory data uniquely hostile to AI pipelines:

Schema heterogeneity: every ERP customization creates a private dialect. Two plants running the same vendor's software often have incompatible schemas.
Temporal misalignment: ERP records transactions after the fact. MES records events in real time. Joining them requires reconciling two different theories of when things happened.
Sparse labeling: the defect codes, hold reasons, and exception flags that make data useful for AI are filled in inconsistently, if at all.

The infrastructure-first principle

Every team we've worked with that tried to add AI before solving the data problem eventually rewrote the AI layer. The teams that solved data first shipped faster and kept the gains.

The reason is simple: if your AI produces outputs based on stale or misaligned data, the operations team stops trusting it within weeks. Rebuilding trust takes months. Rebuilding infrastructure takes weeks.

What the moat looks like

A manufacturer with a functioning real-time data layer (ERP event streams joined to MES activity in under 60 seconds) can run every AI use case on top of it. A manufacturer without that layer has to solve it separately for every use case.

The data layer is not a project. It's a capability. And like any capability, the earlier you build it, the more compounding returns you get from it.