New Monte Carlo Research: Two-Thirds of Orgs Ship Agents Before They are Ready to Support Them
Table of Contents
The experimentation phase for AI agents is over. For most large enterprises, agents aren’t a pilot project or a proof of concept anymore. Teams are shipping agents live, and they are handling real workloads and touching real customer data.
According to a new Monte Carlo study, nearly half of engineers and architects are already running agents in full production, with another 39% are in limited production. The question organizations are wrestling with now isn’t whether to deploy, but rather how to keep these systems running and what to do about the reliability, performance, and safety issues that are already emerging.
Today, we’re excited to release Agents in Production: The Builder’s Perspective, a 2026 survey of 260 builders (engineers, architects) and leaders (VPs, directors, and c-suites) at organizations with 1,000 or more employees. What we found is a measurable operational crisis unfolding in real time, one that is leading to a significant gap between the leaders setting deployment timelines and the builders actually bringing these agentic initiatives to fruition.

The Speed Problem
Our research found that nearly two-thirds of respondents across both builders and leaders (64%) say their organization deployed AI agents faster than their teams felt fully prepared to support. Among software developers and engineers — the people most directly responsible for keeping these systems operational — that figure climbs to 75%.
Speed under pressure has a cost to performance and reliability, and our findings reveal what that cost really is.
Among builders who said they deployed agents faster than their teams were prepared to support:
- 63% said they have already discovered an agent accessing data or systems they weren’t aware of
- 36% cannot disable or roll back a failing agent within minutes
- 70% expect to significantly rebuild or rearchitect systems they’ve already shipped
As the data shows, deploying too quickly does not suggest just future risks, but rather current performance realities.

What “In Production” Means is Unclear
Less than half (46%) of builders say their organization has documented, aligned criteria for what qualifies an agent as ready for real-world deployment. Among leaders, that figure rises to 58%. The gap here is important, as it speaks to one of the points of contention that we see surfacing throughout this research. Alignment on key elements of AI strategy between those setting the vision and those executing is not fully there.

Leaders and Builders See Things Differently
One of the most striking findings in this research is how differently builders and leaders perceive the same systems and challenges. Many classic hierarchical patterns of organizations emerge here between leaders and builders working on agentic systems.
Leaders, for example, are significantly more likely to report that their organizations treat agents with the same rigor as other production systems — more post-incident reviews, defined SLOs, automated rollbacks. Meanwhile, builders are more likely to find out something went wrong through a customer complaint (52%) or manual engineering effort (42%) than through any automated detection.
The confidence paradox is sharpest at the top. Senior leaders — heads of engineering, VPs, CTOs — report the highest confidence in their visibility, with 82% saying they have clear authority to intervene. Yet 50% of those same leaders have already discovered an agent accessing data or systems they didn’t know about.

The Visibility Gap Impacting the Entire System
The thread running through almost every finding in this report is visibility into the agentic system, or rather the lack thereof.
Only 47% of builders say their systems are easily traceable end-to-end when something goes wrong. The majority are stitching together multiple tools and logs, or spending significant manual effort to trace a failure across layers. Agent behavior — how tools get used, when control flow breaks down, what happens in agent-to-agent interactions — is the single largest blind spot, flagged by 62% of builders.
This lack of visibility into the system is the foundation for many of the other higher-level challenges, such as the tendency for teams to discover agents accessing data they are not authorized to do, the admission that teams already know they need to strip down and rebuild their agentic systems soon, and the reliance on customer complaints as a detection mechanism. All of these types of problems trace back to systems that were deployed before anyone could fully see what they were doing.
Accountability Structures Matter More Than Most Organizations Realize
Perhaps one of the most interesting and actionable findings in the data is the impact of how accountability is structured when agents fail.
When accountability is concentrated within the engineering team alone, 70% of those builders expect to significantly rebuild systems they’ve already deployed, and 63% have already discovered unauthorized agent access. When accountability is shared explicitly between engineering and leadership, those numbers drop substantially — to 22% and 39%, respectively.
Shared accountability isn’t just a governance preference, but a best-practice that is associated with slower, more deliberate deployment, better operational outcomes, and substantially lower rates of the problems that dominate this research.

What This Means
Though this research suggests things may be moving faster than teams are ready for, we are not arguing for a slowdown. Competitive pressure is very high in today’s AI-first environment, and organizations that pause while others ship face their own consequences. What this data argues for is investment in the operational layer that makes deployment sustainable: end-to-end traceability across data inputs, model calls, agent behavior, and outputs; unified visibility rather than fragmented signals stitched together manually; and accountability structures clear enough that the person responsible for a failure has the tools and authority to act on it.
The engineers in this study already understand this. They’re not waiting to be told their current setup is insufficient — they know. What they need is for that understanding to be matched by organizational investment in the infrastructure that closes the gap between deployment and control.
You can read the full report here.
Our promise: we will show you the product.