How SeatGeek Reduced Data Incidents to Zero with Data Observability
Data downtime, unknown unknowns, and the specter of schema changes loom large for data teams of all stripes, and the team at SeatGeek was no exception.
As the only mobile ticketing marketplace built for fan experience, SeatGeek made its name on efficient customer experiences. So, when SeatGeek’s data leaders realized they were losing too much time root-causing data issues in their BI reports, they began looking for tools to help them discover their data problems faster.
In this video, SeatGeek Director of Data Engineering Brian London and Senior Analytics Engineer Kyle Shannon share the challenges they faced root-causing data anomalies and how data observability helped the SeatGeek data team reduce data failures per month from 10 to 0 in just one quarter—plus an inside peek at some of the team’s favorite Monte Carlo features and what the future holds for data at their company.
Click the play button to watch the video, or check out our summary below.
SeatGeek’s data landscape
As an online ticketing marketplace with distributed business intelligence needs, data quality plays a big part in SeatGeek’s day-to-day operations. And with two separate data teams—data platforms and data analytics—powering their internal data initiatives, SeatGeek is a company that takes its data seriously.
Despite leveraging a premium data stack and placing expert data leaders at the helm, SeatGeek’s data teams still found themselves losing full days root-causing data anomalies identified by their business users.
“Running a data platform team, really, you’re doing a good job if nobody notices you…Before Monte Carlo, the way we would find out there was a problem, most of the time, is one of the business users would post a Slack message, saying that they’re getting results that don’t make sense,” said Brian.
With an average of 10 internal data downtime issues per month when data was missing, erroneous or otherwise inaccurate, SeatGeek needed a solution that would not only enable their teams to identify and remediate anomalies faster—but also reduce the total number and types of issues they encountered.
Primary data challenges
- Data issues identified by business users were eroding trust in SeatGeek’s internal data products
- Complex data architecture was hiding unknown data quality issues that would eventually become data incidents
- Frequency and severity of data issues required support from both data platform and data analytics teams to root cause and debug
The solution: ML-enabled anomaly detection and lineage tracking
Brian began exploring solutions that would enable the team to root cause their internal data failures faster. They found their solution in end-to-end data observability.
Leveraging Monte Carlo’s ML-enabled anomaly detection and reporting across their data stack, SeatGeek finally had the power to identify data anomalies before they reached SeatGeek’s business users—even across third-party data sources.
And with field-level lineage displaying what models were leveraged into each other at a glance, root-causing anomalies became as simple as tracing issues back to the source. No manual querying required.
Speaking about Monte Carlo’s impact on their data architecture, Brian said, “One of the things that Monte Carlo has done is enable us to stabilize our platform. So, in addition to identifying when there is a problem, it has also helped us to understand where problems are likely to occur, where things are brittle. And over time, we’ve invested effort into cleaning up our lineages, simplifying our logic.”
Key data observability tools
- Automated anomaly alerts to reduce time-to-detection and provide visibility into data quality issues in real-time
- Field-level lineage to see across pipelines and eliminate manual querying in root-cause analysis
- Incident IQ to view data incidents and recommendations at a glance
- Looker integration to understand how data anomalies impact downstream users
The impact of implementing data observability at scale
In addition to providing efficient anomaly reporting and visibility into the health of SeatGeek’s data, data observability has had a demonstrable impact on SeatGeek’s bottom line.
Since implementing Monte Carlo at scale, SeatGeek has:
- Reduced data incidents per month from 10 to 0 in the second quarter after enabling Monte Carlo
- Improved ELT system stability, eliminating data platform-related anomalies
- Reduced resource drain from root-cause analysis by 50% and improved efficiency across all data teams
- Drastically reduced time-to-discovery including discovering unknown unknowns in the data before they become incidents
“This is a tool that saves a lot of time and a lot of stress for people who are on the front lines of our on-call rotation, and it does so in a way that just additional staffing couldn’t…it removes work, it doesn’t replace work,” said Brian.
What’s next for the data team at SeatGeek?
With trust in their data restored, SeatGeek’s data team has set their sights on improving their data’s approachability—and usability—at scale.
Making the data easier to find and access, while doubling down on visibility into the health of their data as well, is top-of-mind as the team looks to the future.
Wondering how Monte Carlo can help reduce your data incidents? Reach out to Brandon and the rest of the Monte Carlo team to learn more.