Tell me if this rings a bell: your boss is convinced that Snowflake is the future of data and informs your team that you need to migrate from your data warehouse to this sparkling new solution more than anything you’ve ever needed in your lives. Isolating storage from compute will save your company so much money, and on top of that, your VP can generate fancy new dashboards for your CEO to track.
Snowflake, a cloud data warehousing platform, makes it easy for data teams to store and use data. Unlike traditional storage solutions, Snowflake supports a plethora of data types and business intelligence tools and makes it easy for internal and external teams to collaborate throughout the ETL pipeline. A relational database, Snowflake can also support most structured and unstructured data types.
Like your VP, many of my customers are excited at the prospect of migrating to a cloud storage and compute solution like Snowflake but they don’t know where to start. Rightly so: I was able to find several articles about migrating from Redshift to Snowflake, but very little about making the polar plunge from other solutions.
After talking to several migrators in the field, I broke down some lesser-discussed considerations for teams moving to Snowflake, regardless of where you’re starting from:
1. Say goodbye to partitions and indexes.
Unlike other data warehouses, Snowflake does not support partitions or indexes. Instead, Snowflake automatically divides large tables into micro-partitions, which are used to calculate statistics about the value ranges each column contains. These insights then determine which parts of your data set you actually need to run your query.
For most practitioners, this paradigm shift from indexes to micro-partitions really shouldn’t be an issue (in fact, many people choose to migrate to Snowflake because this approach reduces query latency). Still, if you have partitions and indexes in your current ecosystem and are migrating to “clustering” models, you need a sound approach. A few tips for a safe migration:
- Document current data schema and lineage. This will be important for when you have to cross-reference your old data ecosystem with your new one.
- Analyze your current schema and lineage. Next, determine if this structure and its corresponding upstream sources and downstream consumers make sense for how you’ll be using the data once it’s migrated to Snowflake.
- Select appropriate cluster keys. This will ensure the best query performance for your team’s access patterns.
Bidding adieu to partitions and indexes is nothing to lose sleep over as long as you have visibility into your data.
2. Expect (and embrace) syntax issues.
Several data teams I spoke with repeatedly called out syntax issues as an inevitable component of any cloud warehouse migration, and a migration to Snowflake is no exception.
One data analyst specifically called out the difficulty converting SSIS packages for handling ETL from her SQL Server to Snowflake, which admits that SSIS packages are not easily integrated with their solution. Such errors were not only frustrating, but substantially slowed down her migration, leading to unforeseen costs and resource constraints.
While modeling solutions like DBT help with validating data sets, the formatting of functions like hashing, time stamps, and dates are often inconsistent between old and new versions of the data.
Additionally, Snowflake is case sensitive, so it’s important that you check for comparison issues in queries. As a result of these issues, some companies can expect to inspect and refactor ALL lines of SQL being migrated.
Syntax errors become a bigger pain point for companies in traditional industries, such as financial services or healthcare (ICD10 codes, I’m looking at you) that have long relied on legacy solutions and manual, error-prone data input. Simply moving to the cloud won’t fix these issues. As one data analyst at a public sector consulting firm told me: “Even if you hire amazing people and put the best data dictionary in front of them, they probably can’t tell you what it all means.”
The sooner you accept syntax errors as a part of the process, the easier it is to identify trends and patterns in these inconsistencies that can expedite their resolution.
3. Monitor your data, always and often.
Similar to syntax errors, data issues can cause even the smoothest Snowflake migrations to fail, generating false or misleading analysis once you hook up your business intelligence tools. Oftentimes these will result in silent errors that will go unnoticed until a consumer downstream catches an issue in a report or dashboard. If you’re lucky, it’s an internal user — and if you’re not, it might just be that new important customer you onboarded only last week and are trying to impress.
Another analyst we spoke with at a digital marketing consultancy noted that it can be hard to ensure comprehensive data definition between your old and new data warehouses. After a few data errors popped up in her company’s new Snowflake warehouse, she decided to test the reliability of their data by evolving two parallel data analytics layers, one her legacy warehouse and one via Snowflake. Using Looker to generate metrics for both stacks, they quickly determined that there were, in fact, inconsistencies between the two warehouses, with each set of metrics presenting different data volumes.
When upgrading your data warehouse, make sure you’re also upgrading the way your team operates, from small things like syntax concurrency all the way to data quality and reliability.
You’ve invested so much in this migration (rightfully so!), it’d be silly to let it all go to waste if the data itself can’t be trusted.
Master your migration
If you move on from indexes and partitions, expect syntax issues, and prioritize data quality, you’ll achieve a more seamless Snowflake migration, facilitating easier collaboration and delivering true business value for your organization.
Moving to Snowflake means more flexibility and scalability for your team, as well as quicker, more reliable insights for your customers — and if you do it right, it can be a force multiplier for your entire organization, too.
Don’t worry: you’ll also impress your boss. I guarantee it.
If you want to learn more, reach out to Barr Moses.