Data Reliability at Scale: How Fox Digital Architected its Modern Data Stack

Fox is more than just a news, media and sports powerhouse; they’re also behind one of the media industry’s most advanced data architectures. Here’s how the early adopters of AWS Redshift, Kinesis and Apache Spark are democratizing data across the organization with data observability in mind. 

Companies are ingesting and storing an incredible amount of data—but not every organization knows how to realize its full value

Data often gets stuck in silos, with requests backing up in ticket queues that never reach overworked data engineers and analysts struggling to serve the needs of their entire organization.

As VP of Data Services at media giant Fox Networks, Alex Tverdohleb has spent the last several years focusing on this problem. He intentionally built out the teams, the technology, and the trust necessary to give internal stakeholders across the digital organization the freedom to conduct ad-hoc analytics without getting centralized data engineers and analysts involvement as much as possible.

As distributed architectures continue to become a new gold standard for data driven organizations, this kind of self-serve motion would be a dream come true for many data leaders. So when the Monte Carlo team got the chance to sit down with Alex, we took a deep dive into how he made it happen. 

Here’s how his team architected a hybrid data architecture that prioritizes democratization and access, while ensuring reliability and trust at every turn.

Exercise “Controlled Freedom” when dealing with stakeholders

Alex has built decentralized access to data at Fox on a foundation he calls “controlled freedom.” In fact, he believes using your data team as the single source of truth within an organization actually creates the biggest silo. 

So instead of becoming a guardian and bottleneck, Alex and his data team focus on setting certain parameters around how data is ingested and supplied to stakeholders. Within the framework, internal data consumers at Fox have the freedom to create and use data products as needed to meet their business goals.

“If you think about a centralized data reporting structure, where you used to come in, open a ticket, and wait for your turn, by the time you get an answer, it’s often too late,” Alex said. “Businesses are evolving and growing at a pace I’ve never seen before, and decisions are being made at a blazing speed. You have to have data at your fingertips to make the correct decision.”

To accomplish this at scale, Alex and his centralized data team control a few key areas: how data is ingested, how data is kept secure, and how data is optimized in the best format to be then published to standard executive reports. When his team can ensure data sources are trustworthy, data is secure, and the company is using consistent metrics and definitions for high-level reporting, it gives data consumers the confidence to freely access and leverage data within that framework.

“Everything else, especially within data discovery and your ad-hoc analytics, should be free,” said Alex. “We give you the source of the data and guarantee it’s trustworthy. We know that we’re watching those pipelines multiple times every day, and we know that the data inside can be used for X, Y, and Z — so just go ahead and use it how you want. I believe this is the way forward: “striving towards giving people trust in the data platforms while supplying them with the tools and skill sets they need to be self-sufficient.”

Invest in a decentralized data team

Under Alex’s leadership, five teams oversee data for the Fox digital organization: data tagging and collections, data engineering, data analytics, data science, and data architecture. Each team has its own responsibilities, but everyone works together to solve problems for the entire business.

“I strongly believe in the fact that you have to engage the team in the decision-making process and have a collaborative approach,” said Alex. “We don’t have a single person leading architecture—it’s a team chapter approach. The power of the company is, in essence, the data. But people are the power of that data. People are what makes that data available.”

While members of different data teams collaborate to deliver value to the business, there’s a clear delineation between analysts and engineers within the Fox data organization. Analysts sit close to the business units, understanding pain points and working to find and validate new data sources. This knowledge informs what Alex and his teams call an STM, or Source to Target Mapping—a spec that essentially allows engineers to operate from a well-defined playbook to build the pipelines and architecture necessary to support the data needs of the business.

This division of labor between analysts and engineers “allows people to focus on their specific areas instead of being spread thin,” said Alex. “Some people may disagree with me, but quite frankly, having developers attend a lot of business meetings can be a waste of their time—because collecting and understanding business requirements often is a strenuous and time consuming effort. By installing the analytics before engineering gets involved, we can bridge that gap and then allow the developers to do what they do best – building the most reliable, resilient and optimized jobs .”

(It’s worth noting, however, that this decentralized approach won’t work for every organization, and the needs of your team structure will vary based on the SLAs your company sets for data. )

Avoid shiny new toys in favor of problem-solving tech

I’ve been in data for over a decade, and I can say under no uncertain terms that Fox has one of the most robust and elegant data tech stacks that I’ve ever seen. But Alex is adamant that data leaders shouldn’t pursue shiny new tech for its own sake.

“First and foremost, in order to be successful at delivering the right underlying architecture, you need to understand the business,” said Alex. “Don’t chase the latest and greatest technology, because then you’re never going to stop. And sometimes the stack you have right now is good enough—all you have to do is optimize it.”

The Fox data team built their tech stack to meet a specific need: enabling self-service analytics. “We embarked on the journey of adopting a lakehouse architecture because it would give us both the beauty and control of a data lake, as well as the cleanliness and structure of a data warehouse.”

Several types of data flow into the Fox digital ecosystem, including batched, micro-batched, streaming, structured, and unstructured. After ingestion, data goes through what Alex refers to as a “three-layer cake”. 

“First, we have the data exposed at its raw state, exactly how we ingest it,” said Alex. “But that raw data is often not usable for people who want to do discovery and exploration. That’s why we’re building the optimized layer, where data gets sorted, sliced-and-diced, and optimized in different file formats for the speed of reading, writing, and usability. After that, when we know something needs to be defined as a data model or included in a data set, we engage in that within the publishing layer and then build it out for broader consumption within the company. Inside of the published layer, data can be exposed via our tool stack.”

The optimized layer makes up the pool of data that Alex and his team provide to internal stakeholders under the “controlled freedom” model. With self-serve analytics, data users can discover and work with data assets that they already know are trustworthy and secure.

“If you don’t approach your data from the angle that it’s easy to discover, easy to search, and easy to observe, it becomes more like a swamp,” said Alex. “We need to instill and enforce some formats and strict regulations to make sure the data is getting properly indexed and properly stored so that people can find and make sense of the data.”

To make analytics self-serve, invest in data trust

For this self-serve model to work, the organization needs to have trust that the data is accurate, reliable, and trustworthy. To help achieve this goal, the entire data stack is wrapped in QA, validation, and alerting. Fox uses Monte Carlo to provide end-to-end data observability, along with Datadog, Cloud Watch Alerts, and custom frameworks to help govern and secure data throughout its lifecycle. 

“Data observability has become a necessity, not a luxury, for us,” said Alex. “As the business has become more and more data-driven, nothing is worse than allowing leadership to make a decision based upon data that you don’t have trust in. That has tremendous costs and repercussions.”

Alex estimates that the Fox digital organization receives data multiple times a day from over 200 sources. They process nearly 10,000 schemas and tens of billions of records per week. “You can’t scale the team to maintain and support and validate and observe that amount of data. You have to have at least a few tools at your disposal. For us to make sure that we have trust in the data’s timeliness, completeness, and cleanliness, tools like Monte Carlo are “must-to-have” . It’s been a great addition to allow us to build an AI-powered overview of what’s happening in our data stack.”

The continual monitoring and alerting Monte Carlo provides, along with automated data lineage, helps Alex’s team to be more proactive about data incidents when they do occur. “We can catch the issues before they hit production and if they do, we know the level of impact by using reverse-engineering to see how many and what kind of objects have been involved, and we can stop it in-flight before it causes a massive impact downstream. It all comes with trust—the moment you drop transparency or start hiding things, people lose trust and it’s really hard to regain it back. I’ve learned that no matter what happens, if you’re being honest and you’re owning the problem, people tend to understand and give you another chance to fix it.”

With the right tech, the right people, and the right processes in place, Alex and his teams have earned the trust required to build a self-serve data platform that powers decisions on a daily basis. 

I don’t know about you, but I’m sold.

If you’re curious how Monte Carlo can help instill data trust in your organization, book a time to speak with us in the form below.

About FOX Tech

Make Your Mark Here.

At FOX, we pride ourselves in shaking things up and making things happen. We’re a community of builders, operators and innovators and each and every day we experiment, collaborate, and co-create to develop the next world of news, sports & entertainment streaming technology.

While being one of the most well-known brands in the world, we provide our employees with the culture of a start-up— fast paced, non-hierarchical, full of smart ideas & innovation and most importantly, the knowledge that each member of the team is making a difference in defining what’s next for FOX Tech. Simply put, we love to do great work, with great people.

Learn more and join our team: