Schema update Issue in DLT
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-06-2025 12:00 PM
I have a pipeline in databricks with this flow
SQL SERVER (Source) -> Staging (Parquet) -> Bronze (DLT) -> Silver(DLT) -> Gold (DLT)
The pipeline is up and running smoothly for months but recently, there was a schema update at my source level and one of the Primary keys of one of my tables got updated from int to bigint.
After the schema change happened at source, it is getting reflected in my pipeline too, so if I go and look at the schemas of the table in all layers, it shows me bigint. So, I believe "mergeschema" handled the schema evolution and the data type of the column has been updated to bigint from int. But, it caused a very strange behavior. The data type is updated but when then pipeline ran next time after the update, it deleted all of the previous data for this column(records are in millions). So now, the pipeline is running and it's fetching the new data correctly(I assume because the data type now matches with the source i.e. bigint) but all of my prev data for this col is now null in the table. And since, this col is a part of a PK, it messed up my data really badly because millions of records are lost.
What exactly could be the reason behind this missing data and how can I fix it? I need to read the deleted records back from my source to make the data correct again.
For side notes, the pipeline is incremental and runs once daily.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-07-2025 07:06 AM
Hi @Alberto_Umana, sorry to bother you. You helped me last time with my DLT problem so I was wondering if you could help me with this one too if you have any idea?
Thankyou!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-07-2025 07:15 AM
Hi @Fatimah-Tariq,
Thanks for your question.
When a schema change involves the data type of a column, and this column is part of a primary key, Delta Lake's schema evolution mechanism can handle it by updating the schema. But, this change can sometimes inadvertently cause data inconsistencies, such as setting previous records to null if the old data cannot be cast perfectly into the new data type.
Use MERGE INTO with the AUTO-OPTIMIZE and AUTO-COMPACTION features to ensure smooth transitions between schema versions.
I will check internally what could be best option to address this , or if a alter to come back to previous data type should be the way.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-07-2025 09:22 AM
Thank you so much for responding. Had to bother you because the issue is high priority since it messed up the data.
I'll be waiting for your answer.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-10-2025 02:00 AM
Hi @Alberto_Umana, is there any word on how to fix my data and bring all the records back to the pipeline schema?

