4 GenAI Opportunities from Real Data Teams
Table of Contents
The funny thing about hype is that it’s always at its apex when information is at its lowest. And GenAI is no different.
A lot of organizations want to talk about AI, but itโs tough to find teams that are actually leveraging it in a meaningful way. (Although, we did create the above image using DALL-E, and I think you could say it’s pretty meaningful.) But, when you find a data leader whoโs on the real AI journey first-hand (no, not Midjourney), itโs natural to have a few questions.
Thatโs why I sat down with Dustin Shoup, Principal Data Engineer at PayNearMe; Mike Carpenter, Senior Manager of Data Strategy and Analytics at Mission Lane; and Dana Neufeld, Data Product Manager at Fundbox during our recent Data Quality Day to understand their strategies for operationalizing generative AI use casesโand how data quality is impacting the process.
Interested? Letโs dive in.
Table of Contents
1. Your internal knowledge base is a greenfield for AI
For many organizations, the knowledge base is a central focus point for increasing product value and adoption. The wealth of knowledge stored in their Confluence or Jira instance is like a treasure chest just waiting to be unlocked. And now GenAI offers even more opportunities to do just thatโwith organizations like Atlassian already offering early examples of this use case.
Now the PayNearMe team is working in the same direction as they leverage GenAI to maximize the value of their extensive knowledge base to their customers. โAt PayNearMe, weโre doing a lot of GenAI on our auxiliary business [rather than the core data flow business],โ says Dustin.
And the first step for Dustinโs team? Getting their data quality in order.
โWeโre working on our infrastructure and getting the right tooling in place, making sure we have the right data quality under the hood,โ said Dustin.
โData quality is key to what weโre doing, and the data quality is already there in the documentation, in Confluence, etc. So, weโre pointing GenAI in that direction, and doing a lot of beta testingโฆWhen we know whatโs going in, we can better understand whatโs coming out.โ
2. Self-service is still relevantโand even more so
For Mission Lane, the name of the generative AI game is interoperability. In a world where free money is a thing of the past, data teams are doing everything they can to drive more stakeholder value at scaleโand demonstrate that commitment to the executive team.
For the last few years, self-service architecture has been the gold-standard for accelerating value. Building a self-service culture meant cost-efficiency, enabling business users and practitioners to leverage existing processes to discover new data, create pipelines, and surface insights. And now GenAI is offering new fuel for the self-service fire.
As the Mission Lane data team has continued to invest in their dbt semantic layer, theyโve recognized an opportunity to leverage Snowflakeโs Co-Pilot to create a self-service insight engine.

โIf we can quantify the quality of our data thatโs used in these semantic models and define these semantic definitions, then we can start enabling our non-technical users to query from that using Co-Pilot,โ says Mike. โThat creates the culture of self-service weโre trying to get to.โ
Integrating their semantic layer with Co-Pilot will enable data consumers downstream to surface insights on their ownโ as long as everyone is clear on the definitions up front.
โWhen youโre thinking about the semantic layer, a lot of it is about conversation,โ says Mike. โThe coding aspect is relatively straightforward for a lot of metrics in the dbt semantic layer.โ He believes that potential problems arise when there are varying definitions across the organization. โThe confusion comes from a lack of communication upfront and defining things incorrectly, and then you end up in the same place you were before.โ
The semantic layer can help a business work toward shared goals, but that means establishing common definitions and validating the quality of that data up-front. No understanding and no data quality means no self-service.
While the application of semantic layer as a source for GenAI use cases is purely theoretical at this point, itโs one Mikeโs team is exploring intentionally with the hope of launching a proof of concept solution in the coming months as more Snowflake features become generally available.
Watch this space!
3. Data asset optimization is the need of the hour
Imagine if you could ask ChatGPT a question like, โWhat data assets should I optimize based on daily usage?โ Or โWhat are my top three most expensive data products based on compute costs?โ
Dustin, Mike, and Dana agree there’s a significant opportunity to leverage GenAI to optimize internal resources and compute costs.
โThe Snowflake budget is a big chunk for many teams,โ says Dustin. โMaybe we could run the output of a Snowflake query, plug it into GenAI, and see if the value is there. Do we need to put in engineering time to reduce costs? We can ask the [LLM] some questions, and it can help identify root cause flows that can be reduced.โ
Mike agrees. โ[We can ask:] whatโs the most expensive thing impacting us right now?โ This type of GenAI use case not only increases the efficiency of a data team, but it helps to prove the ROI of your data team as well.
Monte Carlo is already hard at work to develop these types of features for organizations. Monte Carloโs recently released feature, Performance, allows users to easily filter queries related to specific DAGs, users, dbt models, warehouses, or datasets. Users can then drill down to spot anomalies and determine how query performance was impacted by changes in code, data, and warehouse configuration.

4. The biggest takeaway? Reliable AI needs reliable data.
Unsurprisingly, at the heart of our GenAI discussion was the centrality of data quality. And each of these teams leverage data observability from Monte Carlo to give their teams a comprehensive look into the health of their data across pipelinesโincluding those pipelines feeding GenAI use cases.
โMonte Carlo helps us a ton with visibility into data quality for our users and developers,โ says Dana. โWe use dbt and insight testing for our developers, and it’s way easier to spot quality issues in the code. Now, weโre looking into integrating Monte Carlo and dbt to gain access to documentation and give better access to our users โ through lineage, documentation within the code, etc.โ

But itโs not enough for your data team to know that your data is good. Your stakeholders need to know it tooโand why it matters.
โThe most challenging thing when it comes to data quality is getting buy-in from the senior leadership level,โ says Mike. โData is very important, but data quality is not necessarily emphasized as a way to quantify the value of your data at any given time.โ
Dustin says itโs essential to communicate your definition of data quality to understand if it’s shared across the org. โWe know what we know, and then we talk with our stakeholdersโฆ [Weโre trying to find out] if we need a special metric for a stakeholderโs area of the business, or if they can use mine.โ
The key to solving these challenges?
โHave a good conversation with upstream and downstream people in the organization,โ recommends Dustin.
Data trust requires context. Communication is key to developing a shared understanding of the data, its quality, and the value its driving. โIf you donโt have that data understanding between all the players, then data quality has a fuzzy definition. You canโt do much from there. You have to be on the same page from the start.โ

Data contracts are one solution Dustin recommends that can help teams set and maintain standards as they scale.
But whether you leverage data contracts, workflow management processes, or just a clear directive, the key to operationalizing data quality is getting closer to the businessโand helping the business get closer to the data.
GenAI is the arena. Data observability is the cost of admission.
The GenAI revolution is underway. But is your data ready?
In a recent Informatica survey of 600 data leaders, 45% said theyโve already implemented some form of GenAI, while 42% of the data leaders surveyed still cited data quality as their number one obstacle.
The key to reliable AI is reliable data. And the key to reliable data is end-to-end data observability.
The time to detect and resolve bad data is before it impacts downstream consumers. And that goes double for data thatโs leveraged into AI models. Leveraging a tool like Monte Carlo gives data teams at-a-glance visibility into the health of their data from ingestion right down to consumption, so data teams always know whatโs gone bad, who itโs impacting, and how to resolve it.ย
As Dustin says,โSnowflake monitors the things we know, and Monte Carlo monitors the things we donโt.โ
Because building valuable AI is hard workโbut managing your data quality shouldnโt be.
To learn more about how Monte Carloโs data observability can bring high-quality data to your GenAI initiative, talk to our team.
Our promise: we will show you the product.