Over the past few years, many companies have embraced data platforms as an effective way to aggregate, handle, and utilize data at scale. Despite the data platform’s rising popularity, however, little literature exists on what it actually takes to successfully build one.

Barr Moses, CEO & co-founder of Monte Carlo, and Atul Gupte, former Product Manager for Uber’s Data Platform Team, share advice for designing a data platform that will maximize the value and impact of data on your organization.

Your company likes data. A lot. Your boss requested additional headcount this year to beef up your data engineering team (Presto and Kafka and Hadoop, oh my!). Your VP of Data is constantly lurking in your company’s Eng-Team Slack channel to see “how people feel” about migrating to Snowflake. Your CEO even wants to become data-driven, whatever that means. To say that data is a priority for your company would be an understatement.

To satisfy your company’s insatiable appetite for data, you may even be building a complex, multi-layered data ecosystem: in other words, a data platform.

At its core, a data platform is a central repository for all data, handling the collection, cleansing, transformation, and application of data to generate business insights. For most organizations, building a data platform is no longer a nice-to-have but a necessity, with many businesses distinguishing themselves from the competition based on their ability to glean actionable insights from their data, whether to improve the customer experience, increase revenue, or even define their brand.

Much in the same way that many view data itself as a product, data-first companies like Uber, LinkedIn, and Facebook increasingly view data platforms as “products,” too, with dedicated engineering, product, and operational teams. Despite their ubiquity and popularity, however, data platforms are often spun up with little foresight into who is using them, how they’re being used, and what engineers and product managers can do to optimize these experiences.

Whether you’re just getting started or are in the process of scaling one, we share five best practices for avoiding these common pitfalls and building the data platform of your dreams:

Align your product’s goals with the goals of the business

It’s important to align your platform’s goals with the overarching data goals of your business. Image courtesy of John Schnobirch on Unsplash.

For several decades, data platforms were viewed as a means to an end versus “the end,” as in, the core product you’re building. In fact, although data platforms powered many services, fueling rich insights to the applications that power our lives, they weren’t given the respect and attention they truly deserve until very recently.

When you’re building or scaling your data platform, the first question you should ask is: how does data map to your company’s goals?

To answer this question, you have to put on your data platform product manager hat. Unlike specific product managers, a data platform product manager must understand the big picture versus area-specific goals since data feeds into the needs of every other functional team, from marketing and recruiting to business development and sales.

For instance, if your business’s goal is to increase revenue (go big or go home!), how does data help you achieve these goals? For the sake of this experiment, consider the following questions:

  • What services or products drive revenue growth?
  • What data do these services or products collect?
  • What do we need to do with the data before we can use it?
  • Which teams need this data? What will they do with it?
  • Who will have access to this data or the analytics it generates?
  • How quickly do these users need access to this data?
  • What, if any, compliance or governance checks does the platform need to address?

By answering these questions, you’ll have a better understanding of how to prioritize your product roadmap, as well as who you need to build for (often, the engineers) versus design for (the day-to-day platform users, including analysts). Moreover, this holistic approach to KPI development and execution strategy sets your platform up for a more scalable impact across teams.

Gain feedback and buy-in from the right stakeholders

It goes without saying that receiving both buy-in upfront and iterative feedback throughout the product development process are necessary components of the data platform journey. What isn’t as widely understood is whose voice you should care about.

Yes, you need the ultimate sign-off from your CTO or VP of Data on the finished product, but their decisions are often informed by their trusted advisors: staff engineers, technical program managers, and other day-to-day data practitioners.

While developing a new data cataloging system for her company, one product manager we spoke with at a leading transportation company spent 3 months trying to sell her VP of Engineering on her team’s idea, only to be shut down in a single email by his chief-of-staff.

Consider different tactics based on the DNA of your company. We suggest following these three concurrent steps:

  1. Sell leadership on the vision.
  2. Sell the brass tacks and day-to-day use case on your actual users.
  3. Apply a customer-centric approach, no matter who you’re talking to. Position the platform as a means of empowering different types of personas in your data ecosystem, including both your data team (data engineers, data scientists, analysts, and researchers) and data consumers (program managers, executives, business development, and sales, to name a few categories). A great data platform will enable the technical users to do their work easily and efficiently, while also allowing less technical personas to leverage rich insights or put together visualizations based on data without much assistance from engineers and analysts.
There are a variety of data personas you have to consider when you’re building a data platform for your company, from engineers, data scientists, product managers, business function users, and general managers). (Image courtesy of Atul Gupte)

At the end of the day, it’s important that this experience nurtures a community of data enthusiasts that build, share, and learn together. Since your platform has the potential to serve the entire company, everyone should feel invested in its success, even if that means making some compromises along the way.

Prioritize long-term growth and sustainability vs. short-term gains

Data solutions with short-term usability in mind are often easier to get off the ground, but over time, end up being more costly than platforms built with sustainability in mind. (Image courtesy of Atul Gupte.)

Unlike other types of products, data platforms are not successful simply because they benefit “first-to-market.” Since data platforms are almost exclusively internal tools, we’ve found that the best data platforms are built with sustainability in mind versus feature-specific wins.

Remember: your customer is your company, and your company’s success is your success. This is not to say that your roadmap won’t change several times over (it will), but when you do make changes, do it with growth and maturation in mind.

For instance, Uber’s big data platform was built over the course of five years, constantly evolving with the needs of the business; Pinterest has gone through several iterations of their core data analytics product; and leading the pack, LinkedIn has been building and iterating on its data platform since 2008!

Our suggestion: choose solutions that make sense in the context of your organization, and align your plan with these expectations and deadlines. Sometimes, quick wins as part of a larger product development strategy can help with achieving internal buy-in — as long as it’s not shortsighted. Rome wasn’t built in a day, and neither was your data platform.

Sign-off on baseline metrics for your data and how you measure it

It doesn’t matter how great your data platform is if you can’t trust your data, but data quality means different things to different stakeholders. Consequently, your data platform won’t be successful if you and your stakeholders aren’t aligned on this definition.

To address this, it’s important to set baseline expectations for your data reliability, in other words, your organization’s ability to deliver high data availability and health throughout the entire data life cycle. Setting clear Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for software application reliability is a no-brainer. Data teams should do the same for their data pipelines.

This isn’t to say that different stakeholders will have the same vision for what “good data” looks like; in fact, they probably won’t, and that’s OK. Instead of fitting square pegs into round holes, it’s important to create a baseline metric of data reliability and, as with building a new platform feature, gain sign-off on the lowest common denominator.

We suggest choosing a novel measurement (like this one for data downtime) that will help data practitioners across the company align on baseline quality metrics.

Know when to build vs. buy

One of the first decisions you have to make is whether or not to build the platform from scratch or purchase the technology (or several supporting technologies) from a vendor.

While companies like — you guessed it — Uber, LinkedIn, and Facebook have opted to build their own data platforms, often on top of open source solutions, it doesn’t always make sense for your needs. While there isn’t a magic formula that will tell you whether to build vs. buy, we’ve found that there is value in buying until you’re convinced that:

  • The product needs to operate using sensitive/classified information (e.g., financial or health records) that cannot be shared with external vendors for regulatory reasons
  • Specific customizations are required for it to work well with other internal tools/systems
  • These customizations are niche enough that a vendor may not prioritize them
  • There is some other strategic value to building vs. buying (i.e., competitive advantage for the business or beneficial for hiring talent)

One VP of Data Engineering at a healthcare startup we spoke with noted that if he was in his 20s, he would have wanted to build. But now, in his late 30s, he would almost exclusively buy. 

“I get the enthusiasm,” he says, “But I’ll be darned if I have the time, energy, and resources to build a data platform from scratch. I’m older and wiser now — I know better than to NOT trust the experts.”

When it comes to where you could be spending your time — and more importantly, money — it often makes more sense to buy a tried and true solution with a dedicated team to help you solve any issues that arise.

What’s next?

Building a data platform is an exciting journey that will benefit from applying from a product development perspective. Image courtesy of memegenerator.net.

Building your data platform as a product will help you ensure greater consensus around data priorities, standardize on data quality and other key KPIs, foster greater collaboration, and, as a result, bring unprecedented value to your company.

In addition to serving as a vehicle for effective data management, reliability, and democratization, the benefits of building a data platform as a product include:

  • Guiding sales efforts (giving you insights on where to focus your efforts based on how prospective customers are responding)
  • Driving application product road maps
  • Improving the customer experience (helps teams learn what your service pain points are, what’s working, and what’s not)
  • Standardizing data governance and compliance measures across the company (GDPR, CCPA, etc.)

Building a data platform might seem overwhelming at first blush, but with the right approach, your solution has the potential to become a force multiplier for your entire organization.