Delivering on Data Quality: How Choozle Reduced Data Downtime by 88% with Monte Carlo
In October of 2021, Choozle launched a major platform update providing its 6,000+ advertisers and agencies holistic insight into their campaign performance by providing more data as well as enabling them to connect to outside platforms such as Google Analytics, Facebook, AdWords, and more.
The upgrade also provided a wider range of campaign performance data that users can access and how they can interact with it in the Choozle platform.
“Snowflake gave us the ability to have all of the information available to our users. For example, we could show campaign performance across the top 20 zip codes and now advertisers can access data across all 30,000 zip codes in the US if they want it,” said Adam Woods, Chief Technology Officer, Choozle. “The release was very clean and we have seen a huge swing in customer satisfaction with our new reporting capabilities.”
The Challenge: Table Sprawl and Fragmentation
Choozle’s main data sources are demand-side platforms (DSPs), with which they facilitate the buying of media on behalf of its users. These data sources are static with defined ingest patterns, and are powered by ELT pipelines that bring data into a unified schema where Choozle abstracts the difference in how data is presented between the platforms.
That equation changed and a data quality issue arose when Choozle released its massively powerful unified reporting capability, which allows users to connect external media sources.
“When our advertisers connect to Google, Bing, Facebook, or another outside platform, Fivetran goes into the data warehouse and drops it into the reporting stack fully automated. I don’t know when an advertiser has created a connector,” said Adam. “This created table sprawl, proliferation, and fragmentation. We needed data monitoring and alerting to make sure all of these tables were synced and up-to-date, otherwise we would start hearing from customers.”
Any issues in data quality would impact customers’ willingness to trust the platform.
“Advertisers depend on data visibility so they can quickly address issues and proactively optimize performance,” said Adam. “If we have incidents that impede that visibility, we will hear from customers immediately.”
As a result, Adam and the Choozle data engineering team decided to proactively evaluate data observability solutions to ensure that their data met necessary quality and reliability standards.
“In a previous role, we had a very complex data stack and experienced complications around our data quality, so I have a predisposition to be sensitive to this challenge,” said Adam. “One of my top priorities is to identify and fix any issues before they impact customers. We were going to be proactive and buttoned up on this release.”
The Solution: Data Observability with Monte Carlo
During the evaluation process, Choozle’s key requirements focused on ease of integration and time-to-value.
“We had previously built a lot of custom monitors to see if cron jobs were failing with alerting and escalation through PagerDuty at a job level, but what stuck out to me was Monte Carlo had such a tight integration with Snowflake that it just auto-discovers every possible issue with our data,” said Adam. “The setup was crazy easy for the coverage we got. I didn’t see any competitors that could serve our needs like Monte Carlo does.”
Adam also ruled out open source solutions due to the associated maintenance costs.
“I understand the instinct to turn to open source, but I actually have a lower cost of ownership with a tool like Monte Carlo because the management burden is so low and the ecosystem works so well together. After one phone call with the Monte Carlo team, we were connected to our data warehouse, and we had data observability a week later,” said Adam. “I love that with Snowflake and Monte Carlo my data stack is always up-to-date and I never have to apply a patch. We are able to reinvest the time developers and database analysts would have spent worrying about updates and infrastructure into building exceptional customer experiences.”
For Adam, one of the biggest benefits of Monte Carlo has been the proactive alert monitors and integration with Slack.
“The alerting through Slack gives me immediate visibility. I can see when a developer picks up an issue for investigation and when it has been resolved,” said Adam. “Monte Carlo alerts are high quality. We don’t get many false alarms, which really helps build a culture of urgency to event management and response.”
Monte Carlo has also given the Choozle team deeper visibility into issues that otherwise may not have been proactively caught.
“Without a tool like this, we might have monitoring coverage on final resulting tables, but that can hide a lot of issues,” said Adam. “You might not see something pertaining to a small fraction of the tens of thousands campaigns in that table, but the advertiser running that campaign is going to see it. With Monte Carlo we are at a level where we don’t have to compromise. We can have alerting on all of our 3,500 tables.”
The Monte Carlo platform also provides field-level lineage and centralized data cataloging that allows teams to better understand the accessibility, location, health, and ownership of their data assets, as well as adhere to strict data governance requirements.
“There have definitely been times in using the platform that being able to see the full lineage has helped us understand how the data got to a certain point,” said Adam. “It is a nice layer of extra documentation.”
The Results: 88% Reduction In Data Downtime
Every organization experiences some level of data downtime, or periods of time when data is partial, erroneous, missing or otherwise inaccurate. By leveraging Monte Carlo, Choozle has reduced their data downtime approximately 88%.
“We see about 2 to 3 real incidents every week of varying severity. Those issues are resolved in an hour whereas before it might take a full day,” said Adam. “When you are alerted closer to the time of the breakage it’s a quicker cognitive jump to understand what has changed in the environment.”
For example, the Choozle team realized immediate results when they recently brought in a new stream of media into production. The team had missed a field containing the primary key that uniquely identifies each table record.
“We expect that field to not be null 100% of the time. We started getting a handful of campaigns, about .2%, where that value was null,” said Adam. “That led us to look at the situation and address it before anyone noticed there were certain campaigns with certain views not seeing any data. That type of problem could have magnified over time and filled tables with all kinds of junk. With Monte Carlo, our time-to-detection in this case was accelerated from days to minutes.”
For Adam, Monte Carlo has become an integral part of the modern data stack and a must for Snowflake users.
“I can’t imagine a situation where I would fire up Snowflake and not put Monte Carlo on top of it,” said Adam. “It’s become part of my go-to stack along with Looker and machine learning in GCP.”
Choozle is now focused on its data governance initiatives and bringing in the most advanced privacy compliant techniques into the midmarket as regulations like CCPA and GDPR weigh on advertisers.
“Data bunkers are great if you are a massive company, but when you are a mid-sized organization living in a world with a high compliance bar, you need another type of solution,” said Adam. “Putting data observability in place was an important part of that journey and I’d recommend it as a first step prior to data catalogs and other investments.”
Adam’s advice for other data professionals is to be bold: the risk of innovation has become greatly mitigated over time with the rise of cloud data warehouses like Snowflake and SaaS platforms like Monte Carlo.
“A decade ago when I was running warehouses on-prem, it was very hard to innovate. Every answer to a new suggestion was a no because you don’t want to break the whole warehouse. I encourage my developers to be more aggressive. You aren’t going to break Snowflake by writing a bad query, and if you did we would know it right away,” said Adam. “It’s the same with Monte Carlo. If you already have Snowflake, there is almost no risk in trying. The cost to get set up was the salary for three people for an hour and we saw value immediately.”
Interested in reducing your data downtime? Book a time to speak with us in the form below.
Learn more about how Choozle built an advertising platform on Snowflake in the episode of “Powered By” below.