Adding new columns to a Delta Live table in a CDC process

ameya — Wed, 26 Jun 2024 21:46:48 GMT

I am new to databricks and still learning.

I am trying to do a CDC on a table.

APPLY CHANGES INTO LIVE.table1 FROM schema2.table2 KEYS (Id) SEQUENCE BY orderByColumn COLUMNS * EXCEPT (col1, col2) STORED AS SCD TYPE 1 ;

table1 is in schema1 and is silver layer. table2 is in schema2 and is bronze layer.

table2 may have different, new columns than table1. how do I insert or update these new columns into table1. I am assuming that would give us schema mismatch errors.
I want to insert some new columns based on the whether the operation is an insert operation or an update operation. If it is an update operation, I want to insert a column called 'LastModifiedDate' with the current timestamp and if it is an insert operation I want to insert a column called 'CreatedDate' as current timestamp.

How can I do these 2 things?

Thanks in advance.

Re: Adding new columns to a Delta Live table in a CDC process

raphaelblg — Mon, 01 Jul 2024 17:39:29 GMT

Hi @ameya ,

Scenario 1: Enabling Delta schema evolution in your table or at DLT pipeline level should suffice for the scenario of new fields being added to the schema.

Scenario 2: The INSERT statement doesn't support schema evolution as described in Delta schema evolution. You can explore the MERGE INTO statement for INSERT/UPDATE scenarios, merges will support schema evolution but I believe there is no way to set different schemas based on the underlying operation.

topic Re: Adding new columns to a Delta Live table in a CDC process in Data Engineering

Adding new columns to a Delta Live table in a CDC process

Re: Adding new columns to a Delta Live table in a CDC process