Data Culture

Updated Aug 25 2020

4 Essential Tactics for Managing a Great Distributed Data Team

Will Robins

Will Robins is a member of the founding team at Monte Carlo.

COVID-19 has forced nearly every organization to adapt to a new workforce reality: remote teams. We share four key tactics for turning your distributed data team into a force multiplier for your entire company.

It’s month 6 (or is it 72? It’s hard to tell) of the global pandemic, and despite the short commute from your bedroom to the kitchen table, you’re still adjusting to this new normal.

Your team is responsible for all the same tasks (handling ad-hoc queries, fixing broken pipelines, implementing new rules and logic, etc.), but troubleshooting broken data has only gotten harder. It’s difficult enough to identify the root cause of a data downtime incident when you’re all 5 feet away from each other; it’s 10 times harder when you’re working on different time zones.

Distributed teams aren’t novel, in fact, they’ve become increasingly common over the last few decades, but working during a pandemic is new for everyone. While this shift widens the geographic talent pool, collaborating at this scale entails unforeseen hurdles, particularly when it comes to working with real-time data.

Your daily standup only gets you so far. Here are 4 essential steps to managing a great distributed data team:

Document all the things

Information about which tables and columns are “good or bad” breaks down when teams are distributed. One data scientist we spoke with at a leading ecommerce company told us that it takes 9 months of working on a team to develop a spidey-sense for what data lives where, which tables are the ‘right’ ones, and which columns are healthy vs. experimental.

The answer? Consider investing in a data catalog or lineage solution. Such technologies provide one source of truth about a team’s data assets and make it easy to understand formatting and style guidelines for data input. Data catalogs become particularly important when data governance and compliance come into play, which is top of mind for data teams in financial services, healthcare, and many other industries.

Set SLAs and SLOs for data

It’s important to ensure alignment not just among data team members but with data consumers (i.e., marketing, executives, or operations teams), too. To do so, we suggest taking a page out of the site reliability engineering book and setting and align clear service level agreements (SLAs) and service level objectives (SLOs) for data. SLAs for expectations around data freshness, volume, and distribution, as well as other pillars of observability, will be crucial here.

Katie Bauer, a Data Science Manager at Reddit, suggests distributed data teams maintain a central document with expected delivery dates for important projects, and review that document weekly.

“Instead of pinging my team for updates throughout the week when questions arise from stakeholders, I can easily visit this document for answers,” she said. “This keeps us focused on delivering our work and avoids unnecessary diversions.”

Invest in self-serve tooling

Investing in self-serve data tools (including cloud warehouses like Snowflake and Redshift, as well as data analytics solutions, like Mode, Tableau, and Looker) will streamline data democratization no matter the location or persona of the data user.

Similarly, self-serve versioning control systems helps everyone stay on the same page when it comes to collaborating on larger workflows, which becomes extremely important when it comes to leveraging real-time data across time zones.

Prioritize data reliability

Industries that are responsible for managing PII and other sensitive customer information, like healthcare and financial services, have a low tolerance for mistakes. Data teams need confidence that data is secure and accurate across their pipeline, from consumption to output. The right processes and procedures around data reliability can prevent such data downtime incidents and restore trust in your data.

For many years, data quality monitoring was the primary way in which data teams caught broken data, but this isn’t cutting it anymore, particularly when real-time data and distributed teams are the norm. Our remote-first world calls for a more comprehensive solution that can seamlessly track that can seamlessly track the five pillars of data observability and other important data health metrics tailored to the needs of your organization.

Remember: it’s OK to not be OK

We hope these tips help you accept and even embrace the data world’s new normal.

On top of this more tactical advice, however, it never hurts to remember that it’s OK to not be OK. Emilie Schario, GitLab’s first data analyst who is now an internal strategy consultant, put it best: “This is not normal remote work. What it takes to be successful during a period of forced remote work in a global pandemic is different from what it means to be remote-as-usual.”

Interested in learning more? Book a time to speak with us using the form below.

Our promise: we will show you the product.

Related resources

2023: The state of data quality

Did you know that bad data impacts 31% of a company’s revenue? And that 74% of data engineers say data quality issues are surfaced first by stakeholders? These stats and more in our recent survey with Wakefield Research.

Learn more

3 simple steps for Snowflake cost optimization without getting too crazy.

Snowflake cost optimization efforts need to be right sized. Learn how to get the most savings with investing too much of your team’s time.

Learn more

Data testing vs. data quality monitoring vs. data observability: What's right for your team?

In the fight against bad data and broken pipelines, there are a few popular options. But what makes the most sense for your data quality needs? We’ve got the answers.

Learn more

4 Essential Tactics for Managing a Great Distributed Data Team

Document all the things

Set SLAs and SLOs for data

Invest in self-serve tooling

Prioritize data reliability

Remember: it’s OK to not be OK

Related resources

2023: The state of data quality

3 simple steps for Snowflake cost optimization without getting too crazy.

Data testing vs. data quality monitoring vs. data observability: What's right for your team?

6 Tips For Better SQL Query Optimization

Measuring Data Quality: Key Metrics, Processes, and Best Practices

The Cost of Bad Data

Document all the things

Set SLAs and SLOs for data

Invest in self-serve tooling

Prioritize data reliability

Remember: it’s OK to not be OK

Read more posts.

ETL vs ELT: What’s the Difference (and Which is Better)?

How to Measure the ROI of Your Data Organization

You’re (Finally) In The Cloud. Now, Stop Acting So On-Prem.

Don’t Do Your POC In A Sandbox. Here’s Why.

What are Data Quality Analysts? We Analyzed 55 Job Postings

A Tale of Baseball and Bad Data: Why I Joined Monte Carlo

Related resources

2023: The state of data quality

3 simple steps for Snowflake cost optimization without getting too crazy.

Data testing vs. data quality monitoring vs. data observability: What's right for your team?