Data Observability, Announcements

How Monte Carlo’s New GitHub Integration Helps Data Teams Detect, Resolve, and Prevent Breaking Changes Faster

Monte Carlo GitHub integration

Mei Tao

Product at Monte Carlo

Bad pull requests may not be the root of all evil—but they’re certainly the root of some gnarly data quality incidents. 

The good news? There’s a better way to manage them than pushing code to prod and hoping for the best.

Introducing Monte Carlo’s GitHub integration, which allows customers to easily investigate breaking changes and understand the downstream impact of new pull requests.

Extend data observability to your code

Monte Carlo’s latest integration extends data quality coverage further upstream into your PRs on GitHub, expediting incident resolution and helping teams understand the downstream impact of PRs. 

With this new feature, users can: 

  1. Quickly and easily correlate PRs on Github to data issues, accelerating incident resolution.
  2. Get immediate feedback on the downstream impact of PRs, allowing them to thoroughly test their changes and prevent unintended consequences downstream.

GitHub PRs break data pipelines. Call it a fact of life. Fortunately, with our new GitHub integration, you’ll be the first to know about breaking changes so you can fix them and understand their impact. 

What’s more, our new GitHub integration will help data teams prevent incidents caused by GitHub PRs by running a lineage check in seconds to understand the potential impact of your changes before committing code.  

So, how does this play out in a real data environment?

Pull requests are tracked within the GitHub integration for easy review.

Imagine a data engineer accidentally joins the wrong table in a PR and removes the rows from several critical tables downstream. 

What happens? Most likely, you and the owner of any reports or dashboards the PR is powering will get into a pretty heated Slack exchange about “wrong data.” Then, you might spend anywhere from several hours to a few days finding and then reverting or fixing that bad PR. 

Now, let’s imagine that same scenario with Monte Carlo’s new GitHub integration. 

A bad PR is committed and rows in a critical downstream table are lost. However, this time, data observability comes to the rescue. You receive an immediate alert to Slack about the unusual row deletion and where the problem originated. That same data engineer identifies the problem, notifies the downstream stakeholder while the problem is resolved, finds the faulty PR within seconds, and prevents stakeholder frustration, data engineer headaches, and most importantly, wasted time and resources.

Get started with Monte Carlo’s GitHub integration

Example of a pull request in GitHub.

Because GitHub monitoring requires information about the mapping between customers’ tables to dbt models and their file locations, a dbt integration is currently required to utilize our GitHub integration. 

Fortunately, the integration works seamlessly with both dbt Core and dbt Cloud, and if you haven’t set one up, you can check out our docs to get started with your own dbt integration.

Once you’ve got your dbt integration set up, installing your GitHub app for Monte Carlo is just a few clicks away.

  1. In Monte Carlo, go to Settings > Integrations
  2. In “Notifications and Collaborations”, click “Create” and select “GitHub”:
The GitHub Integration is available within the Notifications and Collaborations menu of Monte Carlo’s UI.
  1. The page will navigate to the GitHub UI. Then select:
    • the GitHub organization
    • (optionally) the repositories that will be accessible to MC
  2. Click “Install and Authorize”
  3. And that’s it!

If you manage multiple GitHub organizations with relevant code, you’ll need to install the app for each organization.For current Monte Carlo users, you can find the latest GitHub integration via the Integrations Settings page. For more info on setup, check out our documentation page or reach out to the Monte Carlo team for support.

What our customers have to say

“In data, incidents happen and pipelines break. But by immediately linking those incidents on key tables to a specific PR, Monte Carlo’s new GitHub integration will empower our team to troubleshoot and remediate those issues faster. We’re excited to bring data observability even further upstream, and in the process, improve data trust for our business.” – Trish Pham, Head of Analytics, PayJoy

“GitHub is an important piece of how we operate at Assurance, both within the data team and across the broader engineering organization, enabling us to build with speed. We’re excited to see Monte Carlo’s new GitHub integration, which has the potential to expedite data development time across a broad range of data roles and ensure greater data reliability in the process.” – Mitchell Posluns, Analytics Manager at Assurance

Not a Monte Carlo user yet? No sweat. Drop your email in the form below and our Data Observability experts will be in touch.