Enterprise Data Warehouse (EDW) Migration use cases are always complex in nature. Data architects and Data engineers spend the majority of time in analyzing the source data, identifying the ingesting patterns, and defining the transformation logic during the design phase of the migration. Even after ingesting the source data and converting the workloads into the modern data platform, the migration projects can still slow down during the data validation phase. So it's important to understand the pitfalls and have a mitigation plan.
Data Validation is an important phase for the success of the project and we shall discuss how to conquer this last-mile of migration!
Migration project starts with the discovery phase in which the source system is analyzed using Data warehouse profilers that can provide a summary of the total inventory along with its complexity. At this stage the migration strategy is defined that would pick one of the following patterns:
Along with it other details at each step of the migration process is defined that includes:
Key Takeaway - It’s important to recognize the value of having a clear strategy towards data validation in the life cycle of a migration project and define the validation approach as early as possible |
EDW migration flows through multiple steps starting from Ingestion and Transformation to querying data, visualization, and serving. Here is a reference architecture depicting how different components of a modern data platform aligns with the migration steps:
Migration steps include:
Key Takeaway - The Ingested and Transformed data along with the visualization dashboards has to be independently validated to make sure every step of migration matches and aligns with the source system. |
Data validation on a migration project happens at 3 stages:
Key Takeaway - The depth of validation varies from customer to customer and also depends on the level of maturity of the data platform. We can find a good balance in between all stages of validation by prioritizing Gold layer data validation to be at each row/column level, simplifying the Bronze layer data validation by defining the KPIs & matching the row counts that can be either total number of orders, daily inventory, parts count etc, and keeping the BI Report validation to minimum/basic checks. |
Data Validation can be performed using one of the following tools:
One or a combination of the above tools can be used to validate the data at the initial 2 stages in the validation approach.
Key Takeaway - An accelerator would always provide a nudge required for speeding the process and reducing the total time taken to complete validations. I have noticed more than 30% of productivity gain and a big boost in customer confidence when we use an Accelerator for validation. It is recommended to identify and customize the accelerators that can automate and simplify the validation process. |
As with any phase of Migration, Data validation has its own challenges but often ignored till it blocks the whole process. Understanding some of these challenges ahead of time and setting a mitigation plan would keep the migration process on path to success.
Challenges |
Mitigation Plan |
Identify the level of validation required based on the project complexity |
Socializing the different levels of validation approach with the stakeholders and scoping out what is required for the project during the initial phase of migration |
Identify the right tools and define the validation approach ahead of time |
Accelerators can bring in the required support to enhance the validation process and reduce the total time to complete the validation successfully. Identify the accelerator as part of the PoC and have the customizations required ready |
Multiple execution cycles of validating the same set of data |
Automate Validation to simplify reruns |
The back and forth between multiple teams involved in validation might be challenging to keep the project on track |
Keeping all the SMEs in check and aligned for signoff |
Evolving the validation process |
Keep a log of all the exceptions captured during the validation process and integrate it back to the automation process |
Data validation sounds like a simple step in the migration process and oftentimes is overseen in the design phase. This can lead to a potential slowdown in the overall migration process. Hence understanding the challenges and defining a validation approach early in the project lifecycle would help in navigating through this phase easily and thus decrease the time taken to complete a successful migration.
import datacompy |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.