cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Moving data to a delta table keeping the old surrogate ids intact

elgeo
Valued Contributor II

Hello experts! We have a table in our current system that we need to move it (one-off) to a delta in Databricks keeping its Ids (surrogate keys) intact. We think to of the following steps:

1. create a new delta table with a "BIGINT GENERATED BY DEFAULT AS IDENTITY" column for the ID

2. move the current data to the new delta table as "GENERATED BY DEFAULT" allows identity insertion to be overridden

3. "ALTER COLUMN ID SYNC IDENTITY" so that new rows that will be inserted from now on will take the next automatically assigned identity value which will start from n (maximum Id in the table) +1

However, the problem with this is that, after the initial load, we cannot alter the surrogate key column to "GENERATED ALWAYS" so that we don't allow overwrites.

We would appreciate your feedback. Thank you in advance

1 REPLY 1

lizou
Contributor II

same here, I submitted an idea in the azure databricks portal

https://feedback.azure.com/d365community/idea/d403303c-6761-ed11-a81b-000d3ae5ae95

SET IDENTITY_INSERT ON

when a column is defined as GENERATED ALWAYS, we often need to reload data with exact same key (commonly a key is referenced by other tables)

similar to SET IDENTITY_INSERT ON in SQL SERVER, it will be very helpful to do the same in delta table.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group