Skip to content
Case Studies, AI Observability Updated Mar 06 2026

How Axios Is Delivering Reliable AI with Agent Observability

AUTHOR | Michael Segner

Removing Newsroom Friction With AI

Axios is an AI-forward media organization. From early experimentation with transformer models to a strategic partnership with OpenAI, the company has consistently looked for ways to apply AI where it delivers real value—without compromising journalistic standards.

Senior Data Scientist Shreye Saxena and the Axios data + AI team are deploying more than a dozen LLM-powered agents to remove friction from the most tedious tasks in the newsroom. 

“We started with a roadshow. Our 4 person team met just about everyone in the newsroom,” said Shreye. “It started with upskilling and the conversation quickly became about how we could help yield efficiencies from things they found cumbersome in their workday.”

One example was rounding up events in their local area. Now instead of googling and compiling this information manually, ChatGPT gathers the information automatically and imports it into the CMS for easy review.

“Our big, hairy audacious goal as a company is to expand our local news and journalism, and we aren’t going to be able to do that without taking advantage of AI and the AI ecosystem,” said Shreye.

Another repetitive editorial task Shreye and team are tackling is the automation of the article tagging process. Once an article has been written it needs to be categorized and mapped within a taxonomy so Axios can maximize relevance for both advertisers and readers.

This initiative has accelerated newsroom velocity, but there have been challenges as well. For example, an article about U.S. tax policy was incorrectly tagged with geopolitical conflict topics because both mentioned the same legislator.

Axios needed to take the next step in its AI maturity journey: agent observability

Axios’s Internal AI Evaluations Hit a Wall

“Observability has always been a strong part of Axios’ engineering culture,” said Shreye. “What changed was the ecosystem that we were operating in. Instead of a self-hosted transformer model, we are now relying on an API call to OpenAI. We needed to assess what is our current level of maturity and how can we advance that in a systemic way?”

That level of maturity quickly evolved from a “vibes based evaluation” involving spot-checking a few outputs to a custom built framework using LLM-as-judge evaluations. While a significant upgrade, there were still challenges with cost and scale.

The internal tool captured prompts and responses sent to OpenAI and ran a second LLM call to evaluate the first. Each response was scored and labeled with a qualitative judgment plus a rationale.

“It was fine, but we quickly realized that we can’t run an evaluation on every single LLM call or we will double our total token cost,” said Shreye. “We wanted to do some type of sampling. We wanted better insight into how the auto-tagger was performing in the aggregate.”

The other issue challenge was the evaluation framework was disconnected from the developer experience.

“We needed to be alerted in Slack to issues when they happened in production, not when someone was checking a dashboard of evaluation results on a Wednesday afternoon,” said Shreye.

That’s when Axios started to evaluate and adopt Monte Carlo’s Agent Observability capabilities.

A Unified Data, ML, and Agent Observability Experience

While Axios is organizationally split between the data foundry and AI engineering teams, having a consolidated data + observability platform allows them to collaborate across a familiar, easy-to-use interface.

“We were using Monte Carlo to observe our data ecosystem and our ML model predictions, so being able to incorporate agent observability workflows in just a few clicks with the same familiarity for how we set up our monitors, and alerts, and get observability across our whole platform was attractive,” said Shreye.

Other key requirements for Axios included ease of instrumentation and trace visibility.

“We had abstracted a method in Python, but it’s much easier to capture all of the spans and associated logs with Monte Carlo’s SDK and the fact it leverages Open Telemetry is a big plus. While there is more to be done, we were able to get started with two lines of code. That efficiency matters a lot, especially as we look to scale the number of agents we bring into production,” said Shreye. “It’s cool to see traces. OpenAI doesn’t keep a historical log of my API calls, there’s not a reporting layer for that, but with Monte Carlo it’s just a button click away in the UI or easily accessible within our Snowflake environment.”

With Monte Carlo, Axios can now see:

  • Prompt and response traces
  • Token usage and latency
  • Evaluation scores over time
  • Anomalies in quality or behavior

Leveraging anomaly detection alongside evaluations also simplified the monitoring process.

“It’s also hard for me to answer what does it mean for the agent to be anomalous, but for Monte Carlo it’s a drop-down and I love that.”

The Four Pillars Of Agent Monitoring

Monte Carlo provides data quality, operational metrics, trajectories and output evaluation monitors to ensure:

-Is the agent retrieving the right context and is that context correct?

-Is the agent performing efficiently?

-Is the agent behaving as intended?

-Is the agent producing outputs that are fit for purpose?”

“It’s important for us to be able to monitor all dimensions of agent reliability including the context, behavior, and outputs,” said Shreye. “If the injected data is incorrect, or if the trajectory is wrong and it produces a response without checking the database, then we’re not going to get the outcomes we need from our agent. As we start to focus on more customer-facing AI features, performance metrics like latency will become even more critical.”

As alerts come in, the Axios team is starting to flesh out their incident response processes as well.

“When an alert fires, we can now look into the exact trace or span that triggered it,” said Shreye. “We can then decide whether we need to adjust our prompts, switch models, or refine the evaluation criteria and get back on track.”

Building Trust as Axios Scales AI

For Axios, AI is central to their goals to expand local journalism and deliver the highest quality news reporting to communities across the country. Shreye and his team are accelerating that mission by building the AI agent development and governance platform the news organization needs to scale their ambitions—one that editors, engineers, and leadership can trust.

Agent Observability is continuously helping data and AI teams scale their agents with confidence. Learn more about Monte Carlo’s Agent Observability capabilities here.

Our promise: we will show you the product.