Skip to content
AI Observability Updated Feb 09 2024

4 GenAI Opportunities from Real Data Teams

AUTHOR | Sydney Nielsen

The funny thing about hype is that it’s always at its apex when information is at its lowest. And GenAI is no different.

A lot of organizations want to talk about AI, but itโ€™s tough to find teams that are actually leveraging it in a meaningful way. (Although, we did create the above image using DALL-E, and I think you could say it’s pretty meaningful.) But, when you find a data leader whoโ€™s on the real AI journey first-hand (no, not Midjourney), itโ€™s natural to have a few questions. 

Thatโ€™s why I sat down with Dustin Shoup, Principal Data Engineer at PayNearMe; Mike Carpenter, Senior Manager of Data Strategy and Analytics at Mission Lane; and Dana Neufeld, Data Product Manager at Fundbox during our recent Data Quality Day to understand their strategies for operationalizing generative AI use casesโ€”and how data quality is impacting the process.

Interested? Letโ€™s dive in. 

1. Your internal knowledge base is a greenfield for AI

For many organizations, the knowledge base is a central focus point for increasing product value and adoption. The wealth of knowledge stored in their Confluence or Jira instance is like a treasure chest just waiting to be unlocked. And now GenAI offers even more opportunities to do just thatโ€”with organizations like Atlassian already offering early examples of this use case.

Now the PayNearMe team is working in the same direction as they leverage GenAI to maximize the value of their extensive knowledge base to their customers. โ€œAt PayNearMe, weโ€™re doing a lot of GenAI on our auxiliary business [rather than the core data flow business],โ€ says Dustin. 

And the first step for Dustinโ€™s team? Getting their data quality in order.

โ€œWeโ€™re working on our infrastructure and getting the right tooling in place, making sure we have the right data quality under the hood,โ€ said Dustin. 

โ€œData quality is key to what weโ€™re doing, and the data quality is already there in the documentation, in Confluence, etc. So, weโ€™re pointing GenAI in that direction, and doing a lot of beta testingโ€ฆWhen we know whatโ€™s going in, we can better understand whatโ€™s coming out.โ€

2. Self-service is still relevantโ€”and even more so

For Mission Lane, the name of the generative AI game is interoperability. In a world where free money is a thing of the past, data teams are doing everything they can to drive more stakeholder value at scaleโ€”and demonstrate that commitment to the executive team. 

For the last few years, self-service architecture has been the gold-standard for accelerating value. Building a self-service culture meant cost-efficiency, enabling business users and practitioners to leverage existing processes to discover new data, create pipelines, and surface insights. And now GenAI is offering new fuel for the self-service fire.

As the Mission Lane data team has continued to invest in their dbt semantic layer, theyโ€™ve recognized an opportunity to leverage Snowflakeโ€™s Co-Pilot to create a self-service insight engine.

A vision of where the dbt Semantic Layer should sit within a modern data stack architecture via dbt.

โ€œIf we can quantify the quality of our data thatโ€™s used in these semantic models and define these semantic definitions, then we can start enabling our non-technical users to query from that using Co-Pilot,โ€ says Mike. โ€œThat creates the culture of self-service weโ€™re trying to get to.โ€

Integrating their semantic layer with Co-Pilot will enable data consumers downstream to surface insights on their ownโ€“ as long as everyone is clear on the definitions up front.

โ€œWhen youโ€™re thinking about the semantic layer, a lot of it is about conversation,โ€ says Mike. โ€œThe coding aspect is relatively straightforward for a lot of metrics in the dbt semantic layer.โ€ He believes that potential problems arise when there are varying definitions across the organization. โ€œThe confusion comes from a lack of communication upfront and defining things incorrectly, and then you end up in the same place you were before.โ€

The semantic layer can help a business work toward shared goals, but that means establishing common definitions and validating the quality of that data up-front. No understanding and no data quality means no self-service.

While the application of semantic layer as a source for GenAI use cases is purely theoretical at this point, itโ€™s one Mikeโ€™s team is exploring intentionally with the hope of launching a proof of concept solution in the coming months as more Snowflake features become generally available.

Watch this space!

3. Data asset optimization is the need of the hour

Imagine if you could ask ChatGPT a question like, โ€œWhat data assets should I optimize based on daily usage?โ€ Or โ€œWhat are my top three most expensive data products based on compute costs?โ€

Dustin, Mike, and Dana agree there’s a significant opportunity to leverage GenAI to optimize internal resources and compute costs. 

โ€œThe Snowflake budget is a big chunk for many teams,โ€ says Dustin. โ€œMaybe we could run the output of a Snowflake query, plug it into GenAI, and see if the value is there. Do we need to put in engineering time to reduce costs? We can ask the [LLM] some questions, and it can help identify root cause flows that can be reduced.โ€ 

Mike agrees. โ€œ[We can ask:] whatโ€™s the most expensive thing impacting us right now?โ€ This type of GenAI use case not only increases the efficiency of a data team, but it helps to prove the ROI of your data team as well. 

Monte Carlo is already hard at work to develop these types of features for organizations. Monte Carloโ€™s recently released feature, Performance, allows users to easily filter queries related to specific DAGs, users, dbt models, warehouses, or datasets. Users can then drill down to spot anomalies and determine how query performance was impacted by changes in code, data, and warehouse configuration.

Performance Dashboard
A screenshot of the Monte Carlo Data Performance Dashboard revealing the longest-running queries for the Snowflake warehouse โ€œTransforming_2.โ€

4. The biggest takeaway? Reliable AI needs reliable data.

Unsurprisingly, at the heart of our GenAI discussion was the centrality of data quality. And each of these teams leverage data observability from Monte Carlo to give their teams a comprehensive look into the health of their data across pipelinesโ€”including those pipelines feeding GenAI use cases. 

โ€œMonte Carlo helps us a ton with visibility into data quality for our users and developers,โ€ says Dana. โ€œWe use dbt and insight testing for our developers, and it’s way easier to spot quality issues in the code. Now, weโ€™re looking into integrating Monte Carlo and dbt to gain access to documentation and give better access to our users โ€“ through lineage, documentation within the code, etc.โ€

But itโ€™s not enough for your data team to know that your data is good. Your stakeholders need to know it tooโ€”and why it matters. 

โ€œThe most challenging thing when it comes to data quality is getting buy-in from the senior leadership level,โ€ says Mike. โ€œData is very important, but data quality is not necessarily emphasized as a way to quantify the value of your data at any given time.โ€ 

Dustin says itโ€™s essential to communicate your definition of data quality to understand if it’s shared across the org. โ€œWe know what we know, and then we talk with our stakeholdersโ€ฆ [Weโ€™re trying to find out] if we need a special metric for a stakeholderโ€™s area of the business, or if they can use mine.โ€

The key to solving these challenges? 

โ€œHave a good conversation with upstream and downstream people in the organization,โ€ recommends Dustin. 

Data trust requires context. Communication is key to developing a shared understanding of the data, its quality, and the value its driving. โ€œIf you donโ€™t have that data understanding between all the players, then data quality has a fuzzy definition. You canโ€™t do much from there. You have to be on the same page from the start.โ€

An example of data contract architecture.

Data contracts are one solution Dustin recommends that can help teams set and maintain standards as they scale.

But whether you leverage data contracts, workflow management processes, or just a clear directive, the key to operationalizing data quality is getting closer to the businessโ€”and helping the business get closer to the data. 

GenAI is the arena. Data observability is the cost of admission.

The GenAI revolution is underway. But is your data ready?

In a recent Informatica survey of 600 data leaders, 45% said theyโ€™ve already implemented some form of GenAI, while 42% of the data leaders surveyed still cited data quality as their number one obstacle.

The key to reliable AI is reliable data. And the key to reliable data is end-to-end data observability.

The time to detect and resolve bad data is before it impacts downstream consumers. And that goes double for data thatโ€™s leveraged into AI models. Leveraging a tool like Monte Carlo gives data teams at-a-glance visibility into the health of their data from ingestion right down to consumption, so data teams always know whatโ€™s gone bad, who itโ€™s impacting, and how to resolve it.ย 

As Dustin says,โ€œSnowflake monitors the things we know, and Monte Carlo monitors the things we donโ€™t.โ€ 

Because building valuable AI is hard workโ€”but managing your data quality shouldnโ€™t be.

To learn more about how Monte Carloโ€™s data observability can bring high-quality data to your GenAI initiative, talk to our team. 

Our promise: we will show you the product.