When you were growing up, did you ever play the name game? The modern data organization has something similar, and it’s called the “Bad Data Blame Game.” Unlike the name game, however, the Bad Data Blame Game is played when data downtime strikes and no amount of rhyming and dancing can save the day.
Data downtime refers to periods of time when your data is partial, erroneous, missing, or otherwise inaccurate, and nine times out of ten, you have no idea what caused it. All you know is that it’s 3 a.m., your CEO is pissed, your dashboards are wrong, and you need to fix it–stat.
After speaking to over 200 data teams, we’ve identified the major data personas involved in the Bad Data Blame Game. Maybe you recognize one or two?
In this article, we’ll introduce these roles, zero in on their hopes, dreams, and fears, and share our approach to conquering data reliability and achieving data observability at your company.
Chief Data Officer
This is Ophelia, your Chief Data Officer (CDO). Although she’s probably not (wo)manning your company’s data pipelines or Looker dashboards, Ophelia’s impact is hitched to the consistency, accuracy, relevance, interpretability, and reliability of the data her team provides.
Ophelia wakes up every day and asks herself two things. First, are different departments getting the data they need to be effective? And second, are we managing risk around that data effectively?
She would sleep much easier with a clear, bird’s-eye view showing that her data ecosystem is operating as it should. At the end of the day, if bad data gets in front of the CEO, out to the public, or to any other data consumer, she’s on the line.
Business Intelligence Analyst
Betty, the business intelligence lead or data analyst, wants a punchy and insightful dashboard she can share with her stakeholders in marketing, sales, and operations to answer their multifarious questions about how their business functions are performing. When things go wrong at the practitioner-level, Betty is the first one paged.
To ensure reliable data, she needs to answer these questions:
- Are we translating data into metrics and insights that are meaningful to the business?
- Are we confident that the data is reliable and means what we think it means?
- Is it easy for others to access and understand these insights?
Null values and duplicated entries are Betty’s archnemeses and she’s a fan of anything that can prevent data downtime from compromising her peace of mind. She’s fatigued by business stakeholders that ask her to investigate a funny value in a report — it’s a long process to chase the data upstream and validate if it’s right!
Sam, the data scientist, studied Forestry in undergrad, but decided to make the jump to industry to pay off his student loans. Somewhere between a line of Python code and a data visualization, he fell in love with data science. And the rest was history.
To do his job well, Sam needs to know 1) where the data comes from and 2) that it’s reliable, because if it’s not, his team’s A/B tests won’t work and all downstream consumers (analysts, managers, executives, and customers) will suffer.
Sam’s team spends roughly 80 percent of their time scrubbing, cleaning, and understanding the context of the data, so they need tools and solutions that can make their lives easier.
Data Governance Lead
Proud owner of a seven-month old puppy, Gerald is the company’s very first data governance specialist. He started off on the legal team, and then, when GDPR and CCPA entered the picture, eventually focused his efforts exclusively on data compliance. It’s a novel role, but becoming increasingly important as the organization grows.
When it comes to data reliability, Gerald cares about 1) unified definitions of data and metrics across the company and 2) understanding who has access and visibility to what data.
For Gerald, bad data can mean costly fines, erosion of customer trust, and lawsuits. Despite the criticality of his role, he sometimes jests that it’s like accounting: “you’re only front and center if something has gone wrong!”
When it comes to data reliability, Emerson, the data engineer, is at the crux of the equation.
Emerson started out as a full-stack developer at a small e-commerce startup, but then as the company grew, so too did their data needs. Before she knew it, she was responsible not just for building their data product but also integrating the data sources the team relies on to make decisions about the business. Now, she’s a Snowflake expert, PowerBI guru, and general data tooling whiz.
Emerson and her team are the glue that hold the company’s data ecosystem together. They implement technologies that monitor the reliability of their company’s data, and if something goes awry, she’s the one whose paged by the analytics team at 3 a.m. to fix it. Like Betty, she’s lost countless hours of sleep because of this.
To be successful at her job, Emerson must tackle a lot of things, including:
- Designing a data platform solution that scales
- Ensuring that data ingestion is reliable
- Making the platform accessible to other teams
- Being able to fix data downtime quickly when it happens
- And above all else, making life sustainable for the entire data organization
Data Product Manager
This is Peter. He’s a data product manager. Peter got his start as a back-end developer, but made the jump to product management a few years ago. Like Gerald, he’s the company’s first-ever hire in this role, which is simultaneously exciting and challenging.
He’s up to date on all the latest data engineering and data analytics solutions, and is often called upon to make decisions on what offerings his organization needs to invest in to be successful. He knows firsthand how automation and self-serve tooling make all the difference when it comes to delivering an accessible, scalable data product.
All other data stakeholders, from analysts to social media managers, are dependent on him for building a platform that ingests, unifies, and makes accessible data from a myriad of sources to consumers all over the business. Oh, and did we mention that this data must be compliant with GDPR, CCPA, and other industry regulations? It’s a challenging role and it’s difficult to keep everyone happy–it seems like his platform is always one transformation away from what BI actually wanted.
Who is responsible for data reliability?
So, who in your data organization owns the reliability piece of your data ecosystem?
As you can imagine, the answer isn’t simple. From your company’s CDO to your data engineers, it’s ultimately everyone’s responsibility to ensure data reliability. And although nearly every arm of every organization at every company relies on data, not every data team has the same structure, and various industries have different requirements. (For instance, it’s the norm for financial institutions to hire entire teams of data governance experts, but at a small startup, not so much. And for those startups that do–we commend you!)
Below, we outline our approach to mapping data responsibilities, from accessibility to reliability, across your data organization using the RACI (Responsible, Accountable, Consulted, and Informed) matrix guidelines:
At companies that ingest and transform terabytes of data (like Netflix or Uber), we’ve found that it’s common for data engineers and data product managers to tackle the responsibility of monitoring and alerting for data reliability issues.
Barring these behemoths, the responsibility often falls on data engineers and product managers. They must balance the organization’s demand for data with what can be provided reliably. Notably, the brunt of any bad choices made here is often beared by the BI analysts, who’s dashboards may wind up containing bad information or break from uncommunicated changes. In very early data organizations, these roles are often combined into a jack-of-all-trades data person or a product manager.
Fortunately, there’s a better way to start trusting your data: data observability. It’s an approach that’s taking off with most innovative companies, no matter who is ultimately responsible for ensuring data reliability in your organization.
In fact, with the right data reliability strategy, the Bad Data Blame Game is a thing of the past and full end-to-end observability is in sight.