The business-aware data observability platform that connects data quality issues to real business impact.
What hundreds of enterprise data teams are actually doing — and where the industry is quietly headed, whether vendors admit it or not.
Every year around this time, we share our predictions for where the data space is headed — not as a thought experiment, but as a signal from the ground. We work with hundreds of enterprise data teams across Penguin Random House, Euronext, CMA CGM, and others. We see budget allocation decisions. We see where trust breaks down. We see what actually ships into production and what stays in the slide deck.
2026 is a different year. AI is no longer a pilot. Agents are no longer theoretical. And the stakes around data reliability have never been higher — because now, it's not a dashboard that gets it wrong. It's a model that trains on the error. Or an agent that acts on it.
Schema changes will continue to break pipelines. Null values will still corrupt dashboards. Volume anomalies will still surface at 2 a.m. on a Tuesday. These are not new problems.
Too many 2026 predictions skip this entirely and jump straight to agents and semantic layers. But if your foundation is broken, the floor you're building on is sand. What is changing is not whether things break — they will — but how fast you detect it, how accurately you quantify the impact, and how autonomously you resolve it.
The storage question is settled. Apache Iceberg has won, or close enough. Store once, query from anywhere — Snowflake, Databricks, Spark, DuckDB, interchangeably. The flexibility promise is real, and it's the right architectural direction.
And now, with AI reasoning over data infrastructure, the metadata layer is where agents get their intelligence — or where they operate blindly. 2026 is the year metadata stops being an afterthought and becomes the actual battleground.
The consolidation wave we saw last year — Snowflake absorbing more surface area, Databricks moving up-stack, dbt and Fivetran merging — was inevitable. Too many tools, too much glue code. This continues in 2026.
But there's a persistent misconception worth correcting: observability cannot be bundled into a single platform, and it won't be. It's not ingestion. It's not transformation. It's the control plane for all of those operations. And a control plane cannot be owned by one of the systems it's supposed to control.
In a world of fewer, larger platforms, your data still flows across different systems. Failures still propagate across layers. And the business impact of those failures sits entirely outside any single data tool.
No spam. One newsletter per month. Unsubscribe anytime.
For years, data observability focused on the technical dimensions of quality: volume, freshness, schema validity, uniqueness. These matter. But they have systematically missed the thing that actually determines whether data is trustworthy for the business.
Without business context, every alert looks the same. Engineering teams drown in noise. The business finds issues first. And firefighting stays reactive — engineers patching issues they don't fully understand, for stakeholders they can't reach in time.
In 2026, platforms that don't treat business context as a core quality dimension — equal in weight to volume or freshness — will fall short. Not because of a product gap, but because context is the layer where technical reality connects to business reality.
Context isn't the future. It's the difference between alert fatigue and a robust, autonomous data observability framework.
— Salma Bakouk, CEO, SiffletETL agents orchestrating their own pipelines. BI agents embedded in dashboards. Data quality agents replacing manual monitoring workflows. The category explosion is real, and the excitement is justified.
What's also real: most of these agents will fail quietly in production. Not because the models are bad. Because the context layer isn't ready.
An agent reasoning without business context can't distinguish between Dashboard V1 and the CFO revenue dashboard. It doesn't know which pipeline feeds a regulatory report. A model that trains on bad data doesn't fail loudly — it degrades silently. An agent that acts on bad data doesn't ask for clarification. It executes.
In 2026, the agents that ship reliably will be the ones built on observability-native infrastructure — where context is built in from the start, not bolted on after the fact.
The modern data stack was built for dashboards. For humans who can look at an outlier on a chart, apply common sense, and ask someone in Slack what it means.
AI models are not humans. They don't apply common sense to outliers — they train on them. And agents don't pause to ask for context — they act. The tolerance for bad data quietly accepted in analytics becomes existentially costly when the consumer is a model.
In 2026, data reliability becomes the prerequisite for AI reliability. The data teams that understand this first will be the ones actually shipping AI products — rather than explaining to leadership why the pilot never made it to production.
Semantic layers are not a new idea. The original promise — a consistent, business-friendly translation layer between raw data and consumption — has been around for decades. It failed before because the tooling wasn't ready and the use cases didn't demand it urgently enough.
Three forces converge in 2026 to make the return irreversible:
In 2026, the semantic layer stops serving only dashboards and becomes the bridge between raw data and business use cases — powering agents with reliable context, enabling business-impact-driven observability, and finally delivering on a promise the industry made a decade ago.
We spent the last decade making the data stack bigger. In 2026, it's about getting back to what matters: stopping the dissociation between data and the business.
— Salma Bakouk, CEO, Sifflet"When something breaks in your data — how long until the business feels it? How long until engineers understand the impact, quantify it, and solve it?"
If the answer is hours or days, you don't have a tooling problem. You have a context problem.
See how Sifflet solves it →Or reach Salma directly on LinkedIn