Data Observability

The Role of Data Observability in Building Reliable GenAI Systems

Lindsay MacDonald

Lindsay is a Content Marketing Manager at Monte Carlo.

Over the past several years, data observability tooling has emerged as a must-have for every industry, from retail to security. And this renewed focus on data quality is bringing much needed visibility into the health of technical systems. 

As generative AI (and the data powering it) takes center stage, it’s critical to bring this level of observability to where your data lives, in your data warehouse, data lake, or data lakehouse.

At IMPACT this last year, we sat down with Krishnan Parasuraman, VP and Head of Global Field CTO Office at Snowflake, to discuss the evolving and increasingly pertinent role of data observability in AI, as well Snowflake’s vision for the future of LLMs and the data industry.

All eyes on GenAI

One thing is certain: generative AI isn’t going anywhere anytime soon. For most organizations, GenAI represents opportunity – for increased efficiency, output, monitoring, and more. GenAI has the ability to lighten the load of ad-hoc and repetitive tasks and open data teams up to more strategic work.

Snowflake has embraced the GenAI opportunity with open arms. “GenAI is a strategic investment for us at Snowflake,” says Krishnan. “We have a strategic partnership with Nvidia, and we have our own LLM models now. It’s a big focus and priority for us.” 

The Snowflake team’s investments are bringing GenAI into reality. Their Snowpark Container Services enable developers to register and deploy containerized data apps using secure Snowflake-managed infrastructure with configurable hardware options, like accelerated computing with NVIDIA GPUs. This will expand the scope of AI/ML and app workloads that can be brought directly to Snowflake data. 

In addition, Snowflake Cortex enables organizations to quickly analyze data and build AI applications directly within Snowflake. Combining these new developments with tools like Monte Carlo, a Snowflake Horizon partner, means Snowflake customers can accelerate the adoption of high quality data for their AI models. 

But bringing AI development closer to your data is just the first step. What sort of GenAI use cases can we expect to emerge? Krishan shares his thoughts on what LLMs will bring to the data space.

3 LLM use cases

Krishnan categorizes the benefits of LLMs into three primary use cases. For him, GenAI will be most helpful when it comes to:

1. Knowledge summarization

“This is something that LLMs are really good at,” says Krishnan. “They can look at mass bodies of text and content and synthesize and summarize it for you.” 

He gives the example of GenAI for a financial services use case, which might leverage multiple internal data sources: loan documents, claims and underwriting forms, prospectuses, 401k submissions, and more. An effective LLM could look at all of this data across domains and summarize it with actionable insights. 

“Let’s say a new compliance regulation comes up. What do you need to do? GenAI and an LLM can answer that question for you by summarizing in a very succinct way.” 

With the right knowledge summarization model, GenAI can save massive amounts of time for a team utilizing large, distributed data repositories on a daily basis.

2. Interactive engagement

Some of the most popular GenAI use cases already in operation, chatbots and other customer engagement tools present a massive opportunity for generative AI models to provide new value.

As Krishnan puts it, use cases that have historically consisted of human-to-human or human-to-machine interaction, like customer support chats, a contact center, or even traditional search within an enterprise, are all ripe for the GenAI picking.

“These could potentially be replaced by GenAI,” says Krishnan. “The [LLM] engagement with these kinds of tools has become much more interactive.”

Thinking through the interactive engagement use cases for your organization is essential. If there’s knowledge coverage for high-volume service topics or frequently asked questions, there’s higher potential for GenAI.

3. Content generation

For Krishnan, net-new content creation based on knowledge generated by an LLM has the potential to produce fascinating use cases – and more useful content than ever before.

“I’m particularly excited about this one,” he says. 

“Our life sciences customers are thinking about [LLMs for] discovery and how they can use this to generate protein sequences based on amino acid knowledge. Our retail customers are thinking about how they can come up with more detailed product descriptions or identify comarketing products. Media companies are thinking about more personalized content for individuals. The opportunities are just boundless.”

Across industries and use cases, GenAI presents the opportunity to produce output that’s more detailed, more personalized, and more effective. 

“GenAI will definitely make us more productive. That’s a very natural outcome,” says Krishnan. “There will be challenges, but organizations have to balance that by bringing humans into the room as much as possible.”

So, if you were wondering whether GenAI is coming for your job next, Krishanan says ‘don’t fret.’ GenAI can’t come up with new ideas – it can only recycle the data it’s fed. That’s why it’s essential to ensure the data pipelines feeding your LLMs are full of high-quality, reliable data.

Data observability is crucial for reliable GenAI

Getting started with GenAI can be as simple as developing an API call to OpenAI within your product. But, just because an API call lowers the barrier to entry doesn’t mean you should walk through the door. You want to make sure your GenAI initiative is generating real value

To do that, organizations will need to allow it to access high-quality proprietary business data. But, as Krishnan says, “Making sure you’re delivering value in a secure manner is very important.”

The old familiar adage rings as true as ever when it comes to GenAI: garbage in, garbage out. The LLM is only going to be as correct as the data is reliable – and data observability helps teams build that trust with a holistic understanding of the end-to-end lineage of their data.

“Data observability is an integral part of your data platform,” says Krishan. Over the last 10 years, he says, organizations have begun to trust their data platforms as their single source of truth. And that trust can’t be built sustainably – especially when it comes to GenAI – without data observability.

“One of the things that organizations fail to understand is the importance of data quality and data observability for AI,” says Krishnan. “A lot of gaps pop up if you’re not focused on quality as a foundational issue, especially when it comes to AI.”

As organizations increasingly look to their data platform as their central source of truth, data observability provides the transparency and accountability they need — when and how they need it. 

Want to learn more about how data observability can enable your team to build reliability into your GenAI initiative? Let’s chat!

Our promise: we will show you the product.