02-10-2026 04:57 AM - edited 02-10-2026 04:58 AM
Hi All,
I have few bronze streaming tables from which I want to load into silver target tables based on some joins. I am doing this by creating a streaming table and using a view to retrieve fields from each bronze table. After that I am using AUTO-CDC to perform CDC logic. I need to populate Identity columns in our silver tables which will be our surrogate keys for gold layer.
Its mentioned in the document that identity columns cannot be used with AUTO-CDC target:https://learn.microsoft.com/en-us/azure/databricks/ldp/limitations
Is there any way to populate Identity column for such target table? Any guidance would be appreciated!
Thanks,
Andolina
02-12-2026 03:06 AM
Potential risks of identity columns are:
Yes you can put identity column into the TRACK HISTORY ON parameter.
Best regards,
02-10-2026 05:59 AM
Wouldn't you want to use the natural key from the source? Typically auto cdc use cases (be they Snowflake, Big Query, Fabric, Databricks, or other) are those which are cloning an external source using its source ID, and then when subsequent information is desired, a downstream transformation takes place. In situations where that's not sufficient, then typically it's some form of structured streaming with a message queue relay in between or some other path. It sounds like your use case is just mis-assigning transformation layers.
02-11-2026 07:48 AM
Hi @Andolina ,
Indeed, prefer using natural keys in your use case. Identity columns have limitations, it is better to use natural keys if available.
Alternatively, you can generate surrogate key from the row values itself (with hash or concat function).
If identity columns are strictly necessary, you'll have to generate them in the bronze table, and put it into theTRACK HISTORY ON in AUTO CDC so that changes on that column are not tracked.
Hope it helps
02-11-2026 10:03 AM
Thank you both for replying. We are indeed using the natural keys from source for auto-cdc keys. The Identity column is not used in auto-cdc but needs to be populated as an added column in my silver tables. Users need that to join multiple tables in gold layer. Now the problem I am facing is with auto-cdc I am not able to populate that directly in silver as there is no such option.
However, I found an alternative way where we create the tables upfront in a separate script. Looks like I cannot do this in python as dp.create_streaming_table() didn't support in my case (may be there is some way here too), but I am able to achieve it using a .sql script. The .sql script needs to be part of the declarative pipeline though.
Example:
02-12-2026 03:06 AM
Potential risks of identity columns are:
Yes you can put identity column into the TRACK HISTORY ON parameter.
Best regards,