cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Databricks Academy Learners
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Setting Column Only on Insert with DLT's apply_changes CDC Merge

Ru
New Contributor

Hi Databricks Community,

We've encountered an issue with setting a column only on insert when using DLT's apply_changes CDC merge functionality. It's important to note that this capability is available when using the regular Delta merge operation, specifically through whenNotMatchedInsert option.

If we want to setup a `CreatedDate` on initial insert. Is there any option available to solve this when using apply_changes, similar to how it's achievable with the regular Delta merge operation (https://docs.databricks.com/en/delta/merge.html)?

1 REPLY 1

Kaniz
Community Manager
Community Manager

Hi @RuWhen using Delta Lakeโ€™s apply_changes CDC merge functionality, you can achieve similar behaviour to setting a CreatedDate during initial insert. Although the whenNotMatchedInsert option is not directly available, you can still accomplish this by following a slightly different approach.

Hereโ€™s how you can handle it:

  1. Initial Insert with CreatedDate:

    • Before applying changes, create a new column called CreatedDate in your target Delta table.
    • Set the CreatedDate value to the current timestamp for newly inserted records.
  2. Apply Changes Using apply_changes:

    • Use the apply_changes functionality to merge data from your source table into the target table.
    • This will handle updates and inserts efficiently.
  3. Custom Logic for CreatedDate:

    • In your apply_changes process, when processing new records (i.e., whenNotMatched), set the CreatedDate column to the current timestamp.

Hereโ€™s a high-level example in SQL:

MERGE INTO target
USING source
ON target.id = source.id
WHEN MATCHED THEN
  UPDATE SET
    target.column1 = source.column1,
    target.column2 = source.column2
WHEN NOT MATCHED THEN
  INSERT (id, column1, column2, CreatedDate)
  VALUES (source.id, source.column1, source.column2, current_timestamp())

Remember to adjust the column names (id, column1, column2, etc.) and adapt this logic to your specific use case. The key point is to manually set the CreatedDate during the initial insert for new records.

For more details, you can refer to the official Azure Databricks documentation on Delta Lake merge.

Feel free to reach out if you need further assistance! ๐Ÿ˜Š

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.