Data Reliability, Generative AI

4 AI Reliability Challenges for Enterprise Media Companies

AI reliability media companies

Sara Gates

Sara is a content strategist and writer at Monte Carlo.

As every organization seemingly races to adopt AI, we can learn a lot from early use cases and success stories. But it may be even more valuable to hear about – and learn from – the challenges of implementing enterprise AI products. 

Recently, we sat down with the data science team at a major media company to discuss exactly that. We talked about their plans for GenAI and the challenges they’ve encountered as they incorporate large language models (LLMs) into their data products — while prioritizing consistency and reliability. 

This media company has been building both traditional machine learning and emerging generative AI applications, primarily in advertising and customer operations. At the core of their advertising business, ML models drive critical forecasting and optimization decisions: predicting future impressions, optimizing ad schedules, and measuring campaign performance through reach and frequency metrics. 

On the customer side, their data science team has developed sophisticated models for risk scoring to identify potential customer churn, customer segmentation (using both supervised and unsupervised learning), and analysis of customer research interviews using LLMs.

In other words, this media company has invested heavily in embedding AI into their operations. But as their projects grow more sophisticated and business-critical, they’ve encountered several significant challenges in ensuring reliable, high-quality results. 

If that sounds familiar, clearly, you’re not alone. Let’s dig in and see what AI reliability challenges look like at this leading media brand.

Challenge 1: Complex Data Dependencies

Like most enterprise orgs, this media company is working with diverse data sources — and managing those complex dependencies has been one of the first hurdles to clear when it comes to implementing AI. 

“We’re working with a pretty large set of features that are derived from a bunch of different sources that are all getting pulled together,” explains the Director of Data Science. “It’s things like campaign performance, CRM data, order data from the finance domain, qualitative data like survey data.”

Silos make this even more challenging, even within the company’s Databricks instance. The company uses a medallion architecture, where data flows from raw (bronze) to standardized (silver) to aggregated (gold) layers. But in practice, each team creates their own separate data transformations directly from the raw data. Different data scientists are building different models with totally disconnected datasets. 

This practice isn’t unusual, but it can lead to problems. Using siloed datasets often creates blind spots that can impact model performance and make it difficult to maintain data quality across the AI lifecycle — as we’ll see in a moment.

Challenge 2: Manual Change Tracking

Building and managing AI models is still surprisingly manual, even for an enterprise media company. 

For example, their data team uses AutoML in Databricks to build their risk scoring model, but still relies on repetitive steps and manual processes to ensure its outputs remain accurate. 

As their Data Science team lead explains, “It actually goes back through a model selection and model validation process. The whole thing basically gets rebuilt once a week. One of the reasons for that was to counter the possibility of data drift — knowing that our data changes over time, knowing that buying behaviors change across the year, knowing that our sales ops teams are constantly changing the way that CRM gets used, and how things pop up in the data.”

The team has also built custom processes to track changes in risk scores and detect these kinds of distribution shifts over time, all manually implemented within their Databricks jobs. While they haven’t encountered major issues yet, they’re bracing for bigger challenges ahead as they transition from one CRM system to Salesforce. 

The team knows these manual processes and reactive approach add an operational burden — and leaves them vulnerable to unexpected data quality issues that could impact model performance.

Challenge 3: Model Reliability and Consistency

One of the team’s most cutting-edge AI use cases came out of a hackathon. The POC was relatively easy to build, but ensuring reliable and consistent outputs has proven much more difficult.

They had a wealth of qualitative data, thanks to the customer research team’s practice of conducting 90-minute client interviews. Traditionally, the researchers manually parsed these interview transcripts to identify key themes and create qualitative datasets. So the data team built a POC of an application that could take a Microsoft Teams transcript, run it through an LLM, and get back high-level themes, subtopics, and the frequency with which those themes appeared in the transcript. 

But as the data team started to build out the application, they ran into model performance issues.

“Getting them even to just behave consistently is a huge challenge,” explains the Data Science team lead. “We’re trying to catch all of the quirky things that it does, especially when we’re trying to get it to generate a structured output.”

The team has had to come up with workarounds to handle these inconsistencies. As the team lead describes, “We have some hacky if-then stuff to catch when it does weird things to a JSON string.” 

While these solutions work most of the time, they represent a patchwork approach to ensuring reliability — and as their LLM applications grow more complex, they’ll need a more comprehensive solution.

Challenge 4: Limited Monitoring and Quality Control

When it comes to monitoring AI systems, like most teams, our media company finds themselves limited by basic tooling that wasn’t built for the complexity of modern AI applications. 

Out of the box, they’re using MLflow in Databricks to track basic model metrics like AUC (area under the curve). But those don’t cover the broader reliability issues that can impact AI systems. This means when model performance drops, the team has no easy way to understand what happened with the underlying data.

As their Director of Data Science puts it: “If there are any issues in the performance of the model…we have to go back and look at the dataset. And that requires a lot of work because we don’t know where the gap is.” 

The data scientists have to spend precious time going back into the originating datasets to manually investigate why performance is suffering. There’s no easy way to trace issues across the pipeline. 

Without automated, comprehensive monitoring across their entire AI pipeline, they’re left playing detective every time something goes wrong — a situation that becomes increasingly unsustainable as AI becomes more central to their business operations.

The Recurring Theme: a Need for End-to-End Observability

These challenges — complex data dependencies, manual change tracking, reliability and consistency, and limited monitoring — aren’t unique to this media company. They’re the growing pains of enterprise AI. And they all point to a common need: comprehensive data observability across data and AI pipelines. 

Data scientists shouldn’t be spending their valuable time on detective work and manual processes. And as AI becomes more critical to business operations, this kind of manual approach goes from repetitive to downright risky and entirely unsustainable.

Data observability can help by automatically surfacing data quality issues early, before they impact model performance. Instead of regularly rebuilding models or writing hacky fixes for LLM outputs, observability lets teams proactively identify and address problems at their source. And rather than maintaining separate data transformation pipelines for each model, they can standardize their approach with confidence that data quality issues will be caught automatically.Want to learn more about how real teams are addressing AI challenges? Check out our recent stories on using LLMs to score customer conversations and building a GenAI chatbot — and get in touch with the Monte Carlo team to see how we can help your organization achieve AI-readiness.

Our promise: we will show you the product.