“Hiring data engineers is a piece of cake!” said no one ever.
And for good reason.
While it’s never been easy to recruit engineers—period, data engineers are a whole new ballgame. Despite the hypergrowth of the data engineering profession, hiring the backbone of your data team has never been more challenging.
Why? In my opinion, it boils down to two key factors:
- Supply doesn’t meet demand. The data ecosystem is evolving fast, more and more companies are betting big on data. This means that data teams are either training existing analysts and scientists to build and scale infrastructure or looking externally to fill these critical roles. Either way you slice it, there aren’t enough of them.
- It’s hard to find the right candidates. The title “data engineer” is nascent, but the work of data engineering has been around much longer. Data scientists, analysts, database engineers, data architects, and back-end engineers have been handling the ingestion, transformation, testing, and infrastructure work required to build and scale data platforms for years; if you haven’t read Maxime Beauchemin’s landmark article about this phenomenon, I highly recommend it. Given the diversity of backgrounds that make up the data engineering mindshare, it can be tough to source the right candidates, and even more difficult for candidates to determine if they’re a good fit for your team based on job description alone.
Compensation aside, what matters most when it comes to recruiting great data engineers is whether or not joining your company will set them up for success, both in their careers and at the company itself. At the end of the day, a competitive salary and impressive logo may get your data engineers in the door, but it doesn’t mean they’ll stick around.
Here are 5 non-obvious ways some of the best data engineering leaders attract and retain top talent:
You measure customer impact
One of my biggest peeves with traditional engineering career ladders is that the promotions process tends to favor the bold—and not always for the better. And by that, I mean, high visibility projects instead of behind-the-scenes or infrastructure work that is equally critical but often more time and resource-intensive than feature engineering, particularly when you’re on a lean team.
Similarly, data engineering work often goes unrecognized or is underappreciated compared to analytics or data science work. (If you haven’t read it yet, I highly encourage you to read this spicy Hacker News thread and corresponding article about this topic. You probably know which one I’m talking about…).
How do you overcome this hurdle as a data engineering leader? It starts with customer impact. Before you even begin recruiting, identify tangible metrics that map data engineering work with its impact on your team’s internal customers. People want to work where their efforts are valued, and setting stakeholder and leadership alignment around the role data engineering plays in the broader data team’s strategy is critical.
According to Javier Pelayo, Director of Platform Engineering, Big Data, at Adidas, setting benchmarks with your executive leadership entails not just strong vision but also a way to tie the impact of the data platform and other data products to the company’s larger technology and business goals.
One way to measure this? Metrics around reducing operational overhead and decreasing time to value for data products.
“In the end, if you are providing any value and you are minimizing the operational overhead for analytics teams and other data users, it’s easy to quantify the value of data engineering to stakeholders and the broader company. From there, recognition and adoption of your team’s data products will follow.”
Other suggestions for metrics to track:
- How much time/money/resources do your data engineers save for the company?
- How does their work decrease time-to-value for significant data products?
- In what ways does their work impact larger engineering or analytic strategies?
With these data points in place, you’re in a better position to highlight to prospective hires how data engineering work is prioritized, celebrated, and foundationally important to the success of your company.
They can grow with your company
Like many tech specialties, data engineering is evolving at a rapid pace. New tools and best practices are constantly emerging, and the best engineers will want to invest time in staying abreast of the latest developments and technologies. And, just like any prospective new hire, data engineers will want to know how they’ll grow with your company.
Nowadays, the onus is on data engineers to own more and more of the end-to-end data transformation workflow, from ingestion to modeling, requiring broader understanding of the space and specialized skills.
Ebunoluwa Oke, Data Product Manager for Canadian food delivery marketplace, SkipTheDishes (part of Just Eat Takeaway.com), is working with the Data Product and Tech team to lead the charge on her team’s migration to a decentralized data architecture, with embedded practitioners responsible for owning the entire data engineering workflow from end to end.
“Any data engineer we hire now and in the future would need to be able to do the end-to-end process of ingesting the data and also creating the data model because we want that engineer to understand the context of the data, not just in the sense of ‘oh this looks wrong,’ but also, in the context of how the data is being used and who is using it,” she said.
With this increased responsibility, it’s important that prospective hires know that their work will be acknowledged and tied directly to bottomline KPIs and success metrics for the broader data team. For many companies, this isn’t the case. It’s no surprise data engineers are often referred to as the “plumbers” of data science. An undoubtedly important and certainly under-appreciated role that—quite literally—keeps your pipe(line)s running.
Investing in career development isn’t just about writing blog articles and speaking at conferences; it’s about setting your direct reports up with a career they can grow into as the team scales.
Your data ecosystem is easy to navigate
These days, it’s cliché—and a P0—for businesses to call themselves “data-driven.” Experienced data engineers have likely seen firsthand that a company may talk a big game about data, but doesn’t actually put in the work to educate its employees on how to work with data products or even empower self-serve access to data.
Without some foundational understanding of where to find, how to interpret, and how to troubleshoot basic data needs, internal data consumers can take up a significant amount of your data engineering team’s time with one-off requests or last-minute demands. And while providing guidance and overseeing governance is often part of the job description, the best and brightest data engineers will understand the difference between a company that talks the talk and a company that walks the walk by investing in data literacy and leveraging a data platform that enables contextual understanding across domains.
Lars Meisingseth, Data Platform Lead with Norwegian Public Roads, suggests that setting the stage for a strong culture of data discovery and accountability will set up your new data engineers and analysts for success once they get started.
“We’re in the process of developing and operationalizing the use of a social framework to discern what data sets are important and which ones aren’t as important, along various dimensions of “Who is using this? How often is it used? How often is this being used? What’s the value potential both internally and externally?” etc. to determine what data should be prioritized and shared across the organization and what can potentially be deprecated. This will also vary across domains and business use cases, but by leveraging a domain-oriented data platform, it will be easier to empower teams to make those judgement calls themselves.”
This also helps foster a culture of valuing data—and when your data engineers work in an environment where their contributions are appreciated, they’re more likely to be satisfied and excited to grow alongside the rest of the team.
You invest in the right tooling
Another common challenge? Using legacy or outdated solutions to try and build a “modern data stack.”
Don’t get me wrong. You don’t have to be Airbnb, Netflix, or Uber to build a good data platform and use fancy tools. At its core, a data platform is just a central repository for all data, handling the collection, cleansing, transformation, and application of data to generate business insights. But this is only possible if you have the right technologies and processes in place to operationalize it at scale.
In addition to the business case for building a data platform (increased innovation, competitive advantage, decentralized data ownership, scaling data analytics, etc.), there is a strong argument to be had when it comes to using your data platform and its underlying technologies as a hiring tool for data engineers.
Recently, there’s been a lot of discussion around whether to go with open source or closed source solutions (the recent dialogue between Snowflake and Databricks’ marketing teams really brings this to light) when it comes to building your data stack. In my opinion, regardless of what side you take, both open source (Apache Spark, Airflow, dbt, etc.) and best-in-class SaaS solutions (Snowflake, Redshift, Fivetran, Looker, Tableau, etc.) are excellent marketing tools for your company. In fact, most data engineering teams invest in a mix of both.
At the end of the day, your practitioners want tools and skill sets that will (1) grow alongside their careers, (2) make their lives easier, and (3) reduce complexity, and choosing whether to go the open source or SaaS route is rarely binary. When choosing a new solution or technology, the conversation should always seek to answer: “is this the right tool for the job/my team.” Sometimes that’s open source, and sometimes that’s SaaS—it really just depends on what you’re trying to achieve and what type of resources of you’re willing to invest.
They have autonomy
A good data engineering leader will give their team the agency and influence to design, build, and scale projects in a way that interests them and pushes their technical and collaborative boundaries.
Often, this boils down to ensuring that they feel ownership over what they’re working on, and that what they’re working on is something that will advance their careers and increase their exposure to cross-functional teams across the company.
“If your data engineers are just running pipelines, that’s not very interesting work,” Sanderson said. “But if they’re building the systems and technologies that power the data platform, that’s much more compelling.”
Encouraging autonomy starts with building a culture of trust in your team’s ability to own and execute on projects delegated to them, as well as ask for their input on what they want to work on before they even join your company. By keeping the lines of communication open, proactively seeking feedback, and encouraging discussion even during the recruiting process, you can set the stage for a collaborative, trusting relationship moving forward, whether or not they choose to work at your company.
The data world is small. You never know when you’ll cross paths next – heck, maybe they’ll even be your manager one day.
They feel challenged
This brings me to my next point: engineers are less likely to join your team if the work they’d be responsible for doesn’t challenge them, either on a technical or cultural level. Whether it’s rolling out a critical new feature, leading the charge on an RFC, or serving as point-of-contact for the implementation of a new tool, it’s critical to ensure that they’re pushing their professional boundaries.
An important part of making sure that your engineers are working on projects that challenge and excite them? Automating rote or manual work, whenever possible.
There’s a common saying among DevOps and Site Reliability Engineers (the predecessors of data engineers, in many ways) that the goal of reliable systems is to reduce manual toil. In the context of data engineering, manual toil might include fielding ad-hoc queries from downstream stakeholders, unit testing, data quality checks, and documentation. All of these activities, and many more, can be automated with the right approach.
As the Head of Data Engineering at an e-commerce company aptly told me, data engineers do not enter the field to test pipelines all day.
“Engineers don’t love this work—it’s not the innovative problem-solving they probably got into the field to accomplish in the first place. It can be mundane to write use case tests every week, every time an API changes.”
Senior or staff engineers in particular will find tremendous value in automation that allows them to spend their time and energy on designing, building, and innovating, rather than maintaining and troubleshooting existing systems. End-to-end data lineage, data observability, and data discovery tools can help you build a more reliable and fault-tolerant data stack while reducing the toil of manual testing and documentation. Some solutions allow for both ML-generated observability and custom rule-setting, depending on your needs.
Other teams choose to invest in automated data ingestion (i.e., Fivetran or Stitch) or augmented analytics. Whatever it takes to let your data engineers focus on what will make an impact for their careers and professional fulfillment as opposed to the the mundane or repetitive.
Hiring the next generation of data engineers
Creating an environment that will draw in the best and brightest data engineers will give you the edge when it comes to recruiting and set the stage for a thriving data organization for years to come.
Building a data engineering team isn’t easy, particularly given how rapidly the industry is evolving, but in my opinion, that’s what makes this experience so rewarding—for managers and new hires alike.
Did I miss anything? Feel free to reach out to Lior Gavish or the rest of the Monte Carlo team to share your insights, feedback, and tips.