cancel
Showing results for 
Search instead for 
Did you mean: 
Databricks Academy Learners
cancel
Showing results for 
Search instead for 
Did you mean: 

Setting Column Only on Insert with DLT's apply_changes CDC Merge

Ru
New Contributor

Hi Databricks Community,

We've encountered an issue with setting a column only on insert when using DLT's apply_changes CDC merge functionality. It's important to note that this capability is available when using the regular Delta merge operation, specifically through whenNotMatchedInsert option.

If we want to setup a `CreatedDate` on initial insert. Is there any option available to solve this when using apply_changes, similar to how it's achievable with the regular Delta merge operation (https://docs.databricks.com/en/delta/merge.html)?

1 REPLY 1

Kaniz
Community Manager
Community Manager

Hi @RuWhen using Delta Lake’s apply_changes CDC merge functionality, you can achieve similar behaviour to setting a CreatedDate during initial insert. Although the whenNotMatchedInsert option is not directly available, you can still accomplish this by following a slightly different approach.

Here’s how you can handle it:

  1. Initial Insert with CreatedDate:

    • Before applying changes, create a new column called CreatedDate in your target Delta table.
    • Set the CreatedDate value to the current timestamp for newly inserted records.
  2. Apply Changes Using apply_changes:

    • Use the apply_changes functionality to merge data from your source table into the target table.
    • This will handle updates and inserts efficiently.
  3. Custom Logic for CreatedDate:

    • In your apply_changes process, when processing new records (i.e., whenNotMatched), set the CreatedDate column to the current timestamp.

Here’s a high-level example in SQL:

MERGE INTO target
USING source
ON target.id = source.id
WHEN MATCHED THEN
  UPDATE SET
    target.column1 = source.column1,
    target.column2 = source.column2
WHEN NOT MATCHED THEN
  INSERT (id, column1, column2, CreatedDate)
  VALUES (source.id, source.column1, source.column2, current_timestamp())

Remember to adjust the column names (id, column1, column2, etc.) and adapt this logic to your specific use case. The key point is to manually set the CreatedDate during the initial insert for new records.

For more details, you can refer to the official Azure Databricks documentation on Delta Lake merge.

Feel free to reach out if you need further assistance! 😊