Data Observability, Generative AI Updated Apr 09 2025

The Future of Reliable Data + AI—Observing the Data, System, Code, and Model

AUTHOR | Lior Gavish

AI can do a lot these days.

At this very moment, an army of SaaS companies are hard at work infusing AI assistants and copilots into every horizontal B2B workflow currently known to humankind. ChatGPT can summarize the web to help sales prospects. Gemini can polish Google documents for research teams. GitHub copilot can even code alongside you like your own pocket-sized Steve Wozniak. The commodities just keep coming.

Still, if DeepSeek taught us anything, it’s that the future of enterprise AI isn’t the next big model release—models are literally a commodity, and increasingly an affordable one. It’s how you integrate AI with your first-party data to deliver new business value that sets you apart. And it’s not sufficient to simply build these data + AI applications – as in any other technological discipline, you have to do it reliably, too.

So, what does it mean to achieve trusted data + AI? In this article, I’ll share how even the best AI applications can break, and share how leading teams are managing reliability at scale across the ever-evolving data + AI estate. 

Understanding How Data + AI Can Break

Data + AI applications are complex. AI failures often begin with data issues, impacting the context provided to models in RAG, the information injected into prompts or the examples fed to models during fine tuning. That said – there’s much more that could go wrong.

Failures can be boiled down into one of four root causes:

Data

First, you have the data feeding your modern data and AI platform. At its most basic, AI is a data product. Both the foundational models and the enterprise applications they support rely on a vast collection of structured and unstructured data to create outputs and deliver stakeholder value. From model training to the RAG pipelines, data is the heart of the AI—and any data + AI quality strategy needs to start here first. 

As with any other data product, bad data from one or more data sources can break downstream AI applications. An innocent misplaced NULL value or slight format change can easily result in wildly inaccurate or wrong results downstream. There’s endless ways a data source can and does change, and it’s unavoidable for owners of data pipelines and products to be occasionally surprised by it.

System

Data + AI applications rely on a complex and interconnected web of tools and systems to deliver insights, models and automations. Among its most basic system components, you might include your warehouse or lakehouse, ETL tools like dbt and Fivetran, orchestration tools like dagster and Airflow, BI tools like Tableau or Looker and many others. The advent of AI brought along a new set of systems – be it your vector database, agent orchestration framework or model APIs. 

Each of these systems has the potential to cause quality issues in the final product, which means that each of these systems needs to be observed carefully. If it matters for your data pipelines, it matters even more for your AI applications. 

Code

Just because you have a data problem doesn’t mean you have a problem with your data. Faulty code has always been a challenge for software and data products. Bad pull requests, breaking schema changes, and simple human error have been wreaking havoc on data pipelines for years. As much as we test it, there will always be unintended consequences that are only apparent in production.

But code takes on new weight in the data + AI system. Observing your code in this context could include not only traditional codebases like SQL transformations, Airflow operators and configuration code – but also the miscellaneous code used to build and control your agents, and even the natural language prompts that are fed into models. 

Model

Finally, you have the models themselves. Much like the dashboards and ML models of old (but not that old), their response is your customer-facing “product.” But as we discussed earlier, the AI responses of today aren’t nearly as simple as the data products of yesteryear (were they simple?). 

These model responses rely on an everything bagel of data and black box generative algorithms to deliver its final outputs. Even with the perfect prompt and the most perfectly curated data, the model itself can generate an output that’s unfit for purpose, e.g. hallucinations. 

Hallucinations are only the most obvious defects. As you implement various use cases for AI, you will find that your evaluation criteria for the quality of the output vary wildly – a generated news article might be assessed very differently from an answer to an HR question from a simple classification of email as fraud.

Data + AI observability must cover inputs and outputs – it is all or nothing

Like traditional data pipelines, it would be impossible to write a test to catch every possible anomaly within an agentic workflow. (And even if you could, you wouldn’t be able to maintain an ever-growing catalog of checks over time.)  

Without scalable end–to-end visibility into both AI outputs and inputs (i.e., data pipelines…), there is no hope of both measuring the reliability of the final product – let alone managing it.

In our opinion, an effective approach to AI reliability should provide coverage for each of the four parts (data, system, code, and model responses) within a single, continuous, and programmatic observability solution that goes beyond detection to make alerts actionable. 

In addition to end-to-end visibility across this foundation, a great data + AI observability solution will provide: 

  • Intelligent monitoring, diagnosis and resolution tailored to your business across the many systems in your data estate, aligning data engineering, data science, analytics, and governance
  • A prescriptive DevOps-like methodology and workflows that help your data team move from reactive firefighting to proactive reliability 
  • Agentic features that accelerate data + AI observability workflows, including monitoring and troubleshooting issues
  • Immediate time to value so you spend more time using the platform and less time implementing it

At the end of the day, the race to realize your company’s agentic futures will be won not by adopting the latest model, but by your ability to reliably deliver business value.

Count us in. 

To learn more about how Monte Carlo can help you deliver reliable data + AI, reach out!

Our promise: we will show you the product.