When your customers are the first to know about data gone wrong, their trust in your data—and your company—is damaged. Learn how the data engineering team at logistics company Optoro faced this challenge head-on, reclaiming data trust and valuable time with data observability at scale. 

Washington, DC-based Optoro has an admirable mission: to make retail more sustainable by eliminating all waste from returns. They provide return technology and logistics for leading retailers like IKEA, Target, and Best Buy, helping increase profitability while reducing environmental waste through recommerce, or finding the “next best home” for returned items.

I recently had the opportunity to chat with Patrick Campbell, Lead Data Engineer at Optoro, during Fivetran’s Data Engineer Appreciation Day. We took a deep dive into Optoro’s data team and tech stack, the challenges they face, and how data observability helps them chase their mission of building a more sustainable retail industry.

The data landscape at Optoro

On the surface, Optoro reroutes returned or unsold items for retailers. But beyond moving large quantities of merchandise, Optoro is really in the business of moving data. A lot of data.

“Our technology platform connects every item to its next best home,” says Patrick. “As you can imagine, this system creates many mission-critical data points as we route return inventory through our system.” 

The Data Engineering team at Optoro recently moved under the engineering organization to work more effectively with tech and product teams, but they naturally collaborate with data quality, data science, and data analytics groups. Optoro also has many data consumers, both internal and external, accessing data through Looker dashboards. But it wasn’t always reliable.

The challenge: data integrity

“We needed insight into the quality of our data, plain and simple,” says Patrick. “We didn’t have a good method for understanding when data might be missing, when it might go stale, or if the data isn’t what we expected.”

That meant that when data issues did occur, customers (not Optoro’s data team) were often the first to know. This would lead to customer dissatisfaction and prevent Optoro from delivering reliable information about the inventory their software manages.

The solution: data observability 

Monte Carlo’s alerting workflow notified Patrick’s team of anomalies in a specific warehouse, triggered by a distribution issue. Image courtesy of Optoro.

Patrick’s team needed to get between their customers and bad data, and they had two options: build or buy. Patrick considered building custom SQL integrity checks with dbt, but knew that with limited resources, his team would only be able to partially cover Optoro’s many pipelines, which would add to the already-significant workload of the Data Engineering team for the long term. 

Instead, they chose to create a proof-of-concept with Monte Carlo to see how far they could get at solving data quality issues with our data observability platform, which uses machine learning to infer and learn what data looks like, identify data quality issues, and notify the right team members when something goes wrong. 

The POC required that Optoro could achieve the following: 

  • Alerting on stale data products
  • Alerting on large pipeline changes, such as schema changes
  • Always-on monitoring and alerting when operational databases safely failing over
  • Automatically generate rules based on expected data behavior
  • The ability to write custom alerts SQL checks to monitor specific use cases. 

These outcomes would help Patrick’s team achieve their end goal of preventing data quality issues from negatively impacting the customer experience. 

Optoro’s Data Platform leverages a Snowflake warehouse, Fivetran for integration, dbt for transformation, Monte Carlo for Data Observability, and Looker for analytics. Image courtesy of Optoro.

We quickly delivered what Optoro was looking for. Patrick and his team moved forward to fully integrate Monte Carlo, layering it alongside Snowflake, Fivetran, dbt, and more in Optoro’s data stack. These tools work together from ingestion to transformation to BI reporting, with Monte Carlo keeping an eye on things at every stage of the data lifecycle.

The outcome: achieving trusted data through lineage

Monte Carlo lets Patrick’s team map end-to-end lineage across their data assets, down to the field level in Looker. Image courtesy of Optoro.

With Monte Carlo’s monitoring and alerting in place, Patrick and his team are now the first to know when data goes missing or pipelines break. And when alerts do come in, the data engineers can resolve issues faster thanks to the automated lineage Monte Carlo provides.

“We can get a visual on affected data sources, from internal data marts all the way downstream to our Looker reports that could be client-facing,” Patrick says. “Being able to quickly identify client-facing issues and be proactive is really the key to building trust in our data. And this feature makes the data engineers’ jobs much much easier—I can tell you definitely from experience here.” 

For Optoro’s data engineers, the relief of having end-to-end, fully automated data lineage that doesn’t require any manual mapping or updates is one of Patrick’s favorite parts of the platform. 

“The fact that the Monte Carlo system is able to build this lineage itself is remarkable,” says Patrick. “This required little to no input from our data teams in terms of structuring upstream and downstream dependencies.”

The outcome: data teams saving time and stepping up 

The Optoro data engineering team also estimates that using Monte Carlo saves them at least four hours per engineer, per week, on support tickets to investigate bad data. With a data engineering team of 11+ members, this totals to 44 hours each week.

And since all data teams—not just engineers—can access self-service monitoring and alerting, data catalog views, and lineage through Monte Carlo, Patrick reports that other data teams are stepping up to take more ownership of data and more responsibility in the products they’re shipping. 

“Not only is this a huge win for Data Engineering in terms of trying to track down the needle in the haystack issues, but it helps us enable other data teams to help us keep trust in our data,” says Patrick. “Putting these frameworks in place takes Data Engineering out of being the middleman or woman in these situations…Data integrity really should be self-service. And your data engineers will thank you.”

Curious how data observability can help your team build trust and save time? Reach out to the Monte Carlo team to learn more!