As founders of companies that build solutions designed to help teams deliver on the promise of data, we knew we wanted to build great products that are easy to deploy and manage for our customers.
We also knew that since we would be integrating with our customers’ data stacks, we would need to offer the highest level of security and compliance. The question was: how are we going to build them? SaaS? On-prem? Something else?
To meet these goals, we chose a hybrid deployment architecture, a new approach that marries on-prem security with SaaS convenience. Here’s why.
With more and more data being ingested and stored by your company than ever before and increased scrutiny around how this data is accessed and used, layering in any type of new third-party solution to your existing data infrastructure implicitly comes with its fair share of security concerns.
In fact, in the age of GDPR, CCPA, HIPAA and many other important acronyms, managing complex data systems and keeping sensitive data secure are two distinct problems that don’t have a one-size-fits-all solution.
For data leaders, choosing which problem to address can feel like an exercise in picking your poison.
The traditional on-prem deployment model
On-premises (on-prem) refers to the traditional deployment model in which the software runs in the customer’s environment, often inside a dedicated VPC. In particular, all data that the service stores or processes remains in the customer’s cloud.
While the vendor writes the code, the customer maintains full control and ownership of the data.
On-prem solutions provide customers a binary and a license key (in the case of licensed software), and the customers manage the deployment. It is the traditional deployment model of choice for countless software products. Examples include MemSQL and Splunk’s early on-prem offerings, and rely on the customer’s engineering and IT team to handle the deployment. Compared to SaaS solutions, on-prem platforms often provide greater customization and configuration based on the customer’s needs.
For the customer, an obvious benefit of choosing a vendor that leverages an on-prem architecture is rooted in perceived security and compliance. By keeping the data in the customer’s environment, an on-prem architecture exposes no connectivity to external parties. Moreover, the vendor cannot access any sensitive information since all of the data and software is housed in the customer’s cloud.
An on-prem deployment model requires that the customers shoulders most of the operational overhead. The customer must troubleshoot disaster recovery situations, such as application outages and data downtime, which can be time-intensive and lead to a subpar experience.
A second limitation of the on-prem model is lack of speed-to-deployment, both of the baseline software and any future product updates. Since the software lives in the customer’s environment, upgrades can be a tedious process that requires extensive permissions and additional resources.
The SaaS model
Software-as-a-Service (SaaS) solutions offer off-the-shelf software hosted in the vendor’s cloud that can be provisioned and used instantaneously by customers. In this model, the software is run and managed by the vendor, with the customer data stored in the vendor’s cloud. Pioneered by Salesforce, notable more recent examples from the data world include Snowflake, Segment, and Chartio.
The SaaS model makes it easy for vendors to make updates, roll out new features, and address common pain points at scale, as opposed to pushing changes to individual customers’ environments. For many, this creates a more delightful user experience, often at a lower cost. This also extends to maintenance on the software, which is outsourced to the vendor who knows the software best.
When you throw data into the mix, the SaaS model becomes a bit more complicated, particularly as it relates to compliance requirements and data lock-in.
While any self-respecting SaaS provider will encrypt your data at rest, it is still locked away in the vendor’s environment. As a result, many customers are unwilling (or for regulatory reasons, unable) to hand off the management and storage of data entirely.
Even if a customer is comfortable signing away the responsibility for securely storing the data outside of their environment, they still have to accept the fact that data is now fully locked-in and under the vendor’s control.
So, what does it take to get the compliance and flexibility benefits of an on-prem solution with the ease-of-deployment and convenience of a SaaS vendor?
We believe that there’s a better way forward for modern data products: a hybrid architecture.
The hybrid deployment model
Over the past decade, we’ve seen a rise in software engineering and DevOps teams across industries leverage hybrid cloud architectures to manage infrastructure-as-a-service applications, including New Relic and Atlassian. More recently, many data software vendors have made a similar design decision.
To marry the best of the SaaS world and on-prem world for the modern data stack, buyers of data software should consider solutions that incorporate a hybrid architecture. This approach is composed of two parts: (1) a control plane managed by the vendor, and (2) a data plane in the customer’s environment.
The control plane
The control plane typically hosts the majority of the software’s business logic and handles insensitive metadata. It communicates with the data plane and delegates sensitive operations (such as processing, storing or deleting data) to it. The control plane also provides web and API interfaces, and monitors the health of the data plane. The control plane runs entirely in the vendor’s environment and typically follows a multi-tenant architecture, though some vendors offer a single-tenant control plane (often for a price premium) that runs in a customer-dedicated, completely isolated, VPC.
The data plane
The data plane typically processes and stores all of the customer’s sensitive data. It must be able to receive instructions from the control plane, and pass back metadata about its operations and health. Technically, the interface between the control and data plane is often implemented by a thin agent that runs in the customer’s environment. Some vendors are even able to skip an agent altogether and fully leverage cross-cloud account IAM roles.
At its essence, separating the customer’s data from the managed software gives customers the agility of a SaaS product with the compliance and data ownership of an on-prem solution, and keeping customer data in the customer’s cloud environment at all times.
Quicker onboarding and time to value
The hybrid architecture enables customers to deploy software quickly and, often with very little manual overhead.
Simultaneously, this speedy onboarding allows customers to derive near-immediate impact from the product, and in the short term, near-immediate value from their data or ML models.
As part of this deployment model, vendors with hybrid solutions typically provide on-call support as a key feature of their product — almost like an embedded SRE team for their customers.
Manage complex infrastructure and sensitive customer data independently
One of the biggest benefits of the hybrid model is that it frees customers from needing to configure or maintain the vendor’s complex infrastructure, while giving customers complete control over their data.
One way to do this is to give the vendor access to the “data plane” through an agent or cross-account roles, extracting information such as metadata, query logs and aggregated statistics. Unlike many SaaS products, no individual records or PII are ever taken out of the customer’s data warehouses, lakes, or BI tools and stored on the vendor’s cloud.
The hybrid approach also facilitates the addition of “knobs” that control the extent of account permissions for the vendor (i.e., the more permissive, the less management on the customer’s side, and vice versa). This gives customers greater agency over data access and security, which is critical for industries such as FinTech and healthcare where sensitive data abounds and the margin of acceptable error is low to non-existent.
By having the vendor manage the service’s compute resources, it also ensures that any issues with the product can be quickly resolved by the vendor without burdening the customer. A data agent solves this problem and ensures the vendor’s infrastructure can be easily maintained, debugged, and updated without effort or resources from the customer.
Pro-tip for vendors reading this: we recommend getting your SOC2 certification early — you’ll thank us later. Many enterprises, specifically in GDPR, HIPAA, and SOX compliant industries, will require this before they even consider working with you.
Fast and continuous software upgrades
The hybrid deployment model keeps stable cloud primitives like S3, EMR, DynamoDB in the customer cloud and all of the ever-improving, fast-moving pieces of infrastructure (i.e. the product itself) in a managed cloud. As a result, customers can integrate new solutions into their data stack more easily as the cost of trying out and getting started with a hybrid-model vendor is significantly lower than it is for complex on-prem software. Hosting the service in the vendor’s environment also makes rolling out updates for all customers much easier and more seamless than if it were hosted in the customer’s private cloud.
Hosting the service in the vendor’s environment means that customers can get access to new features, ensuring that innovation and product development is not driven in silos. Customers don’t even have to be aware of different software versions and slow upgrade cycles anymore – they can rest assured that they’ll always be using the latest and greatest release, fully automatically.
Hybrid models give customers flexibility over how they choose to use the product, for instance, if they want to deploy it across their entire stack or only in a few select data environments, and can easily add or subtract instances of the service as necessary.
Charting the path forward for flexible, secure data stacks
While we did not know each other when we founded Monte Carlo and Tecton, we ended up choosing a similar architecture for our products. This hybrid model ended up being critical in our ability to support data and ML organizations while also getting a seal of approval from security teams.
By leveraging a hybrid SaaS/on-prem architecture, solutions providers can build data products that are easy to deploy, require little to no operational overhead on behalf of the customer, facilitate full data ownership, and, perhaps most significantly, ensure the utmost data security and compliance.
At the end of the day, why shouldn’t you have it both ways?
This article was cross-posted on Tecton’s blog.
And join us at apply() — the ML data engineering conference, on 4/21 and 4/22. This is a free virtual event for ML practitioners to discuss data engineering challenges and applied ML. Register here.