I have a pipeline in databricks with this flow
SQL SERVER (Source) -> Staging (Parquet) -> Bronze (DLT) -> Silver(DLT) -> Gold (DLT)
The pipeline is up and running smoothly for months but recently, there was a schema update at my source level and one of the Primary keys of one of my tables got updated from int to bigint.
After the schema change happened at source, it is getting reflected in my pipeline too, so if I go and look at the schemas of the table in all layers, it shows me bigint. So, I believe "mergeschema" handled the schema evolution and the data type of the column has been updated to bigint from int. But, it caused a very strange behavior. The data type is updated but when then pipeline ran next time after the update, it deleted all of the previous data for this column(records are in millions). So now, the pipeline is running and it's fetching the new data correctly(I assume because the data type now matches with the source i.e. bigint) but all of my prev data for this col is now null in the table. And since, this col is a part of a PK, it messed up my data really badly because millions of records are lost.
What exactly could be the reason behind this missing data and how can I fix it? I need to read the deleted records back from my source to make the data correct again.
For side notes, the pipeline is incremental and runs once daily.