COVID-19 has forced nearly every organization to adapt to a new workforce reality: remote teams. We share four key tactics for turning your distributed data team into a force multiplier for your entire company.
It’s month 6 (or is it 72? It’s hard to tell) of the global pandemic, and despite the short commute from your bedroom to the kitchen table, you’re still adjusting to this new normal.
Your team is responsible for all the same tasks (handling ad-hoc queries, fixing broken pipelines, implementing new rules and logic, etc.), but troubleshooting broken data has only gotten harder. It’s difficult enough to identify the root cause of a data downtime incident when you’re all 5 feet away from each other; it’s 10 times harder when you’re working on different time zones.
Distributed teams aren’t novel, in fact, they’ve become increasingly common over the last few decades, but working during a pandemic is new for everyone. While this shift widens the geographic talent pool, collaborating at this scale entails unforeseen hurdles, particularly when it comes to working with real-time data.
Your daily standup only gets you so far. Here are 4 essential steps to managing a great distributed data team:
Document all the things
Information about which tables and columns are “good or bad” breaks down when teams are distributed. One data scientist we spoke with at a leading ecommerce company told us that it takes 9 months of working on a team to develop a spidey-sense for what data lives where, which tables are the ‘right’ ones, and which columns are healthy vs. experimental.
The answer? Consider investing in a data catalog or lineage solution. Such technologies provide one source of truth about a team’s data assets, and make it easy to understand formatting and style guidelines for data input. Data catalogs become particularly important when data governance and compliance come into play, which is top of mind for data teams in financial services, healthcare, and many other industries.
Set SLAs and SLOs for data
It’s important to ensure alignment not just among data team members but with data consumers (i.e., marketing, executives, or operations teams), too. To do so, we suggest taking a page out of the site reliability engineering book and setting and align clear service level agreements (SLAs) and service level objectives (SLOs) for data. SLAs for expectations around data freshness, volume, and distribution, as well as other pillars of observability, will be crucial here.
Katie Bauer, a Data Science Manager at Reddit, suggests distributed data teams maintain a central document with expected delivery dates for important projects, and review that document weekly.
“Instead of pinging my team for updates throughout the week when questions arise from stakeholders, I can easily visit this document for answers,” she said. “This keeps us focused on delivering our work and avoids unnecessary diversions.”
Invest in self-serve tooling
Investing in self-serve data tools (including cloud warehouses like Snowflake and Redshift, as well as data analytics solutions, like Mode, Tableau, and Looker) will streamline data democratization no matter the location or persona of the data user.
Similarly, self-serve versioning control systems helps everyone stay on the same page when it comes to collaborating on larger workflows, which becomes extremely important when it comes to leveraging real-time data across time zones.
Prioritize data reliability
Industries that are responsible for managing PII and other sensitive customer information, like healthcare and financial services, have a low tolerance for mistakes. Data teams need confidence that data is secure and accurate across their pipeline, from consumption to output. The right processes and procedures around data reliability can prevent such data downtime incidents and restore trust in your data.
For many years, data quality monitoring was the primary way in which data teams caught broken data, but this isn’t cutting it anymore, particularly when real-time data and distributed teams are the norm. Our remote-first world calls for a more comprehensive solution that can seamlessly track the five pillars of data observability and other important data health metrics tailored to the needs of your organization.
Remember: it’s OK to not be OK
We hope these tips help you accept and even embrace the data world’s new normal.
On top of this more tactical advice, however, it never hurts to remember that it’s OK to not be OK. Emilie Schario, GitLab’s first data analyst who is now an internal strategy consultant, put it best: “This is not normal remote work. What it takes to be successful during a period of forced remote work in a global pandemic is different from what it means to be remote-as-usual.”