Data Migration Risks and the Checklist You Need to Avoid Them
Data migration refers to the movement of data from one location (warehouse or database), format, or SaaS application, to another. Data migration risks are unforeseen issues that:
- Extend the length of the migration
- Lower productivity in the new environment
- Generate internal or external compliance violations
- Expose or lose valuable or sensitive information
Data migration has always been important, but it’s come to the forefront in recent years because so many organizations are undergoing digital transformation and migrating from on-premise solutions to the cloud. And not just that, many organizations are now even launching cloud to cloud migrations as they consolidate services or optimize their infrastructure.
On the face of it, data migration should be fairly straightforward – just pick up your data and move it where it needs to go. After all, as data engineers aren’t we migrating data all the time within our ETL pipelines? In the case of system migrations, there is total control over both the source and the target destinations–no third parties involved!
In reality, it’s never quite that simple. Sure, terabytes or even petabytes of data are involved, but generally it’s not the size of the data but everything surrounding the data–workflows, access permissions, layers of dependencies–that pose data migration risks.
Data migrations are also highly visible projects with severe consequences to both the organization and project manager for failed attempts.
That’s not to mention the data migration risks of data corruption, duplication, or inconsistencies,. In other words, there’s plenty that can go wrong when migrating data. In this post, we’ll be looking at how to mitigate those risks.
Table of Contents
- Risks Associated with Data Migration
- 5 Tips to Mitigate Data Migration Risks
- The Future of Data Migration
Risks Associated with Data Migration
A recent Experian study found that 64% of data migration projects they analyzed went over budget and that only 46% of projects were delivered on time. Less than 70% of projects were considered a success. It would be naïve to think that any migration project will go on without any hiccups.
Here are a few common data migration risks:
Sometimes data goes missing during the migration process. This can be due to format incompatibilities, automatic truncation, unknown validation settings, and network interference, among other reasons.
Database differences and schema management
Each database, even in the cloud, stores values a little differently–but those little changes can be big data migration risks. For example, one data leader gave us the example of how two data warehouse store dollar amounts differently. One stored them as an integer which means if the value was listed as $100.00 in their source database, it would have become 10,000 in their target database.
In an ideal world, you don’t want to migrate every single table or field. You only want to migrate what is being used or will be valuable in the new environment. Not only will this save money during the migration, it will prevent you from junking up your new environment. After all, you just escaped your data swamp right?
But because data is highly dependent with multiple tables feeding others through layers of transformations and queries, it can be difficult to assess what to ship and what to toss. That’s where field level data lineage and data health insights can help mitigate this type of data migration risk.
A typical modern data stack has multiple tools integrated with one another as part of the larger data platform. Most integrate with one another but if you use a more niche tool, you could find parts of your platform no longer integrate as smoothly with one another. You are also going to need to rejigger any data connectors you have set up. Failures here could be disastrous if it impacts key work or data flows.
Like a backup hard drive or SD card that refuses to work…on a much bigger scale.
When using multiple sources, or in the process of re-running failed jobs you might end up with the same data entered more than once.
Data governance, compliance and access management
Moving a table is relatively simple. But what about the permissions and policies surrounding that table? Most of the time that will need to be refactored. Security can also be a challenge if the migration involves unstructured data. You could be placing sensitive information or PII at risk and not know it.
Although we can take steps to prevent most of these issues occurring – more on that below – we can’t necessarily control all of them. Data corruption, for example, is more difficult to predict in advance. Fortunately, we can take certain measures to make sure it’s not the end of the world if something goes wrong during the data migration process.
5 Tips to Mitigate Data Migration Risks
As we’ve seen above, data migration is not without obstacles. However, data migration risks shouldn’t be what stops you from taking your data to where you can extract the most value.
Here are some tips for mitigating some of the risks of data migration. Curious to learn more from someone who has recently done it? Check out these data warehouse migration steps and tips from the vice president of engineering at Unbounce.
Plan and prepare
As the old saying goes, “when you fail to plan, you plan to fail.” Rather than rushing into data migration, take some time to plan out exactly what you want to achieve from it.
- Assess the current state of the data, i.e. what does it look like now?
- Identifying the target state of the data, i.e. what do you want it to look like afterwards?
- Is there any ROT (redundant, obsolete, and trivial) in your source data that doesn’t need to be migrated?
- What workflows and dependencies do you need to rebuild in the new environment?
- Set clear migration objectives, i.e. what do you want data to be able to do?
- How will you communicate with your end users?
Assess vendors and tools
We don’t sell a migration solution and we don’t consult so we have no ulterior motive in this advice. Most of the large data lakes, data warehouse, or data cloud providers will have tools (usually free) built to help you move to their environment.
It’s in their best interest to help you move quickly and efficiently so they can turn on the meter. One of the best ways to mitigate data migration risks is to use purpose built tools and take advantage of those who have had experience launching large scale migrations.
Do a test run
Once you’ve identified the approach you plan to take, whether you’re using self-scripted, on-premise, or cloud-based tools, doing a pilot run is a great next step.
A pilot migration project might involve setting up a test environment, migrating a data set, using reporting and/or business intelligence tools to ensure that the migration succeeds, and seeing whether there are any ways in which the actual migration could be improved.
AWS, for example, recommends that those involved in a migration “determine the AWS service configuration required for your performance needs and validate that its performance is better than on premises.” In other words, is this specific approach the right one for you?
Migrate at non-peak hours
In an ideal world, every data migration will go perfectly smoothly…but we don’t live in an ideal world. By migrating at non-peak hours, you reduce the risk of upsetting customers if you need to take services down temporarily while you restore or recover.
If, say, your current systems update at the end of the business day (rather than in real time), then migrating at non-peak hours could also be useful for ensuring complete data sets.
Try to migrate in the middle of the day, and you risk stakeholders trying to change or access data mid-migration. Not to mention the nightmare of needing to roll things back during the working day if the migration doesn’t go as planned…
Have a rollback and recovery plan
Sometimes, a data migration is enough of a mess that it’s better to think about a rollback rather than trying to fix it. Provided you’ve followed all of the appropriate steps – securing data, creating timely backups etc. – then it’s not a huge deal to try the migration again.
Different cloud providers have different approaches to recovery: Snowflake has a fail-safe feature for viewing historical data, and AWS outlines four different strategies for rolling back from a migration.
This is an area you’ll need to pay close attention to and, depending on the tool(s) you’re using, take appropriate measures yourself (like comprehensive back-ups) pre-migration.
The Future of Data Migration
Right now we’re seeing lots of businesses moving data from on premise solutions to the cloud. But that doesn’t mean that data migration will become less relevant as businesses storing their data in the cloud from inception becomes the status quo.
When that happens, the focus will shift from migrating on premise data to the cloud (where we are now) to optimizing storage and migration within the cloud. There’s a risk of vendor lock-in when that happens – providers won’t want to make it too easy to simply switch to a cheaper solution – but it’s also likely that providers will be competing to make migration easier, with built-in tools to back up relevant data automatically, pull off secure migrations, and so on.
Whether you’re migrating to a new cloud data solution or deploying a new tool, data reliability is as important as it’s ever been. In fact, given how the cloud and real-time data often go hand in hand, one could even argue it’s more important now than it’s ever been.
When you know you can rely on your data, validating successful migrations is easier. You can also see, at a glance, how moving data will impact downstream dependencies, and identify critical assets that you may need to keep a closer eye on. And having data that’s consistent and complete will always improve the chances of your next migration being a success.
Interested in learning how to keep data quality high once you’ve mitigated data migration risks and are in your new environment? Talk to us by filling out the form below.