Data Mesh 101: Everything You Need To Know to Get Started

Your company wants to build a data mesh. Great! Now what? Here’s a quick primer to get you started — and prevent your data infrastructure from turning into a hot mesh.

Since the early 2010s, microservice architectures have been adopted by companies far and wide (think: Uber, Netflix, and Airbnb, among others) as the software paradigm du jour, sparking discussion among engineering teams about the pros and cons of domain-oriented design.

Now, in 2021, you’d be hard-pressed to find a data engineer whose team isn’t debating whether or not to migrate from a monolithic architecture to a decentralized data mesh.

Developed by Thoughtworks’ Zhamak Dehghani, the data mesh is a type of data platform architecture that embraces the ubiquity of data in the enterprise by leveraging a domain-driven, self-serve design. 

As companies become increasingly data driven, the data mesh lends itself well to three key elements of the modern data organization: 

  1. The appetite for more and more data, ingested and leveraged by stakeholders across the company as opposed to a lone team of “data wranglers” 
  2. The increasing complexity of data pipelines as teams seek to do more and more intelligent things with their data
  3. The rise of a standardized data observability and discoverability layer to understand the health of your data assets across their life cycle

The potential of the data mesh is simultaneously exciting and intimidating, and like the microservice architecture before it, has inspired a lot of discussion around what it takes to operationalize data at scale. 

Unlike traditional monolithic data infrastructures that handle the ETL in one central data lake, a data mesh supports distributed, domain-specific data consumers and views “data-as-a-product,” with each domain handling their own data pipelines. Underlying the data mesh is a standardized layer of observability and governance that ensures data is reliable and trustworthy at all times. Image courtesy of Monte Carlo.

To guide you on your data mesh journey, we put together our list of essential data mesh reading: 

The Basics

  • How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh – Zhamak Deghani’s original piece is the Holy Grail of all data mesh content. Think of this article as your gateway into the rest of the data mesh canon, whetting your appetite for future discussions around opportunities, challenges, and key considerations when implementing the design in practice. Her architectural diagrams are critical for understanding how the data mesh strikes a new pose against centralized architectures. 
  • Data Mesh Principles and Logical Architecture – a follow up to Zhamak’s first piece, this article goes into more practical detail about how to actually implement a data mesh at scale, and takes a step back to explain how and why federated governance is critical to the architecture’s success. A must-read for anyone curious about the nuts and bolts of the data mesh. 
  • Data Mesh Applied – Sven Balnojan, Head of Data Analytics and Data Science at Mercateo Gruppe, walks readers through how data teams can apply a DevOps, “data-as-a-product” mindset to their data architecture by migrating from a monolithic data warehouses and lakes to a data mesh. He also touches on how an average business (in this case, an e-commerce company), might go about this migration and how to democratize data ownership and access appropriately. 

Supplementary Reading

  • What is a Data Mesh — and How Not to Mesh it Up – in 2020, several customers approached my co-founder and I with questions about what it took to implement a data mesh architecture at scale, and whether or not a data mesh made sense for their team. In this beginner’s guide, we walk through some key considerations, particularly as it relates to setting your mesh up for success with data observability and discoverability. 
  • Is the Data Mesh Right for Your Organization?in Hyperight’s latest on the topic, they interview various data leaders and consultants about the reasons why (or why not) to implement a data mesh architecture. TL;DR: if your team is already adopting a domain-oriented approach to data ownership and struggling with data management, a data mesh may be the right architecture to take your organization to the next level. A key point: companies that lean into automation and DataOps are more likely to be set up for success companies that haven’t.
  • Introduction to Data Mesh: A Paradigm Shift in Analytics Data Management (Parts 1 and 2) – Think of these two videos as additional context for Zhamak’s earlier writing on the data mesh. In these twin talks for Starburst Data’s SuperNova conference, Zhamak goes into greater detail about her motivations behind designing this new paradigm and how best-in-class data teams are already applying the data mesh at scale (with automation) to deliver more reliable, actionable insights to their companies. 

Primary Sources

  • Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake – Max Shultze, a data engineer at Zalando, and Arif Wider, a ThoughtWorks consultant, discuss how the fashion e-commerce company turned their “data swamp” into a domain-driven, operationalized data lake by leveraging data mesh principles. For those serious about decentralizing their data architecture and removing data engineering bottlenecks (whether or not you’re going full mesh), this is a must-watch. 
  • Intuit’s Data Mesh Strategy – Tristan Baker, Chief Architect of Intuit’s Data Platform, discusses why and how Intuit decided to implement a data mesh architecture to, as he puts it, “decrease pandemonium and increase productivity to get back to the business of making customers happy.” According to Tristan, key challenges included data discoverability, data understandability, and data trust. By organizing the code and data as a “data product,” Intuit was able to set clear data responsibilities, service ownership, and target outcomes. 
  • Netflix Data Mesh: Composable Data Processing – in this video from Flink Forward 2020, Justin Cunningham, Director of Data Architecture at Netflix, discusses how his team built a data mesh architecture to specifically handle composable data processing. Unlike other talks and articles, this presentation goes into the nitty gritty details about the ways in which they applied a data mesh framework to handle one element of the data transformation process — moving data between Netflix systems. 

This list is by no means exhaustive, but it should get you started on your data mesh journey. And for those curious about building a data mesh or looking to share best practices, consider joining the Data Mesh Learning Slack group.

Until next time — here’s wishing you data mesh magic! 

Is your company building a data mesh? Reach out Barr Moses with your experiences, tips, and pain points. We’d love to hear from you!