How to Meet Your Data Reliability OKRs with Monte Carlo’s Service-Level Indicators (SLIs)

We have a service-level agreement (SLA) for our Key Metrics table, which powers our executive dashboards. It needs to be updated every day by 7:00 am. When we miss the SLA, we have to be proactive or else we get lots of frustrated emails. Can Monte Carlo alert us if we ever miss this deadline?

I’ve heard versions of this story dozens of times from customers over the past year.

Fortunately, there’s now a better way to track critical data quality OKRs and restore data trust with your end users: Service Level Indicators (SLIs) in Monte Carlo

SLIs make it easy to specify deadlines for when tables should be updated. Setting up an SLI is a 30-second, no-code process that’s easy for both technical and non-technical users alike. You can easily track performance of the SLI over time, so with true confidence your data team could say: “we’ve met our SLAs 97% of the time this quarter, against an objective of 95%.”

Monte Carlo’s Data Observability Platform offers a powerful suite of ML monitors that provide broad coverage across 1,000s or even 100,000s of tables with little to no configuration. But we’ve learned from customers that tracking SLAs requires specific rules that will trigger an alert if breached. ML-based detectors adapt to changes in your data to minimize false positive notifications, which makes them ill-suited for hard-coded SLAs.

Why? We’ve seen that:

  • Critical tables are sometimes updated just minutes before the SLA is due. If daily updates run between 6:40 a.m. and 6:50 a.m. for a 7:00 a.m. SLA, then a 30-minute delay is a major problem. But 30 minutes for other pipelines may not be noteworthy.
  • Chronically unstable pipelines can lessen the sensitivity of your ML-based detectors. If important advertising data from a partner is late 3 out of 7 days, you probably still want to know each time.

Monte Carlo SLIs solve these issues by letting you hard-code your important deadlines and expectations, triggering alerts when they’re not met.

To set up an SLI in Monte Carlo, go to Monitors > Create Custom Monitor > Freshness and Volume SLI. In this initial release, you can set monitors for Freshness and Volume of data.

What’s the experience like to set up an SLI with Monte Carlo? Here’s how it works:

1) Select the important table to monitor.

2) Set the frequency, schedule, and the breach conditions. For example, “check daily at 7:00 a.m. that the table has received an update in the previous 3 hours.” Click “Add Monitor.”

3) Go to Settings > Notifications to set up define where the alert should be sent. Alerts for an SLI are managed like alerts for SQL Rules.

With Monte Carlo SLIs, data teams can better monitor and alert for delayed or missed updates to key tables and reports — and in turn, achieve more trustworthy data and happier end users.

Stay tuned for future articles that dive how to set better SLIs for data reliability and other data incident resolution best practices.

Interested in learning more about Monte Carlo SLIs?  Reach out to Will Robins and the rest of the Monte Carlo team.