Moving data to a delta table keeping the old surrogate ids intact

elgeo
Valued Contributor II

Hello experts! We have a table in our current system that we need to move it (one-off) to a delta in Databricks keeping its Ids (surrogate keys) intact. We think to of the following steps:

1. create a new delta table with a "BIGINT GENERATED BY DEFAULT AS IDENTITY" column for the ID

2. move the current data to the new delta table as "GENERATED BY DEFAULT" allows identity insertion to be overridden

3. "ALTER COLUMN ID SYNC IDENTITY" so that new rows that will be inserted from now on will take the next automatically assigned identity value which will start from n (maximum Id in the table) +1

However, the problem with this is that, after the initial load, we cannot alter the surrogate key column to "GENERATED ALWAYS" so that we don't allow overwrites.

We would appreciate your feedback. Thank you in advance

lizou
Contributor III

same here, I submitted an idea in the azure databricks portal

https://feedback.azure.com/d365community/idea/d403303c-6761-ed11-a81b-000d3ae5ae95

SET IDENTITY_INSERT ON

when a column is defined as GENERATED ALWAYS, we often need to reload data with exact same key (commonly a key is referenced by other tables)

similar to SET IDENTITY_INSERT ON in SQL SERVER, it will be very helpful to do the same in delta table.