Here's me use case: I'm migrating out of an old DWH, into Databricks. When moving dimension tables into Databricks, I'd like old SKs (surrogate keys) to be maintained, while creating the SKs column as an IDENTITY column, so new dimension values get a new SK, unique over the older SKs coming from the old DWH.
So, if I have a table d_something, with 2 columns (sk, bk) containing one row:
sk = 12, bk = ABC
I'll copy this into a new Databricks Delta table, and when I insert a new row:
INSERT into d_something (bk)
VALUES (DEF)
A new SK be generated, so:
sk = 12, bk = ABC
sk = 13, bk - DEF
(doesn't have to be sequential, just unique).
By this: https://docs.databricks.com/sql/language-manual/sql-ref-syntax-ddl-alter-table.html
I imagine this should be possible to create the table, populate it manually with old SKs, then alter the SK column into IDENTITY (using SYNC IDENTITY).
So far I managed to create a fresh table with IDENTITY column, such as:
CREATE TABLE sk_get_test_1 (
sk BIGINT GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1),
bk STRING
)
But manually populating the SK column is returns an error the IDENTITY columns cannot be manually populated.
Can I create it a a regular column, populate the old SKs, and then alter to IDENTITY column?
Any other ideas here?
Thanks!!