cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Moving data to a delta table keeping the old surrogate ids intact

elgeo
Valued Contributor II

Hello experts! We have a table in our current system that we need to move it (one-off) to a delta in Databricks keeping its Ids (surrogate keys) intact. We think to of the following steps:

1. create a new delta table with a "BIGINT GENERATED BY DEFAULT AS IDENTITY" column for the ID

2. move the current data to the new delta table as "GENERATED BY DEFAULT" allows identity insertion to be overridden

3. "ALTER COLUMN ID SYNC IDENTITY" so that new rows that will be inserted from now on will take the next automatically assigned identity value which will start from n (maximum Id in the table) +1

However, the problem with this is that, after the initial load, we cannot alter the surrogate key column to "GENERATED ALWAYS" so that we don't allow overwrites.

We would appreciate your feedback. Thank you in advance

1 REPLY 1

lizou
Contributor II

same here, I submitted an idea in the azure databricks portal

https://feedback.azure.com/d365community/idea/d403303c-6761-ed11-a81b-000d3ae5ae95

SET IDENTITY_INSERT ON

when a column is defined as GENERATED ALWAYS, we often need to reload data with exact same key (commonly a key is referenced by other tables)

similar to SET IDENTITY_INSERT ON in SQL SERVER, it will be very helpful to do the same in delta table.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.