Databricks Community

WiliamRosa · ‎08-15-2025

What is the recommended approach for handling deletes in a Delta table?
I have a table in MySQL (no soft delete flag) that I read and write into Azure as a Delta table. My current flow is:
- If an ID exists in both MySQL and the Delta table → update the record in Delta.
- If an ID exists in MySQL but not in Delta → insert it into Delta.
The challenge is: how do I handle deletes? Specifically, if an ID exists in the Delta table but does not exist in the latest MySQL extract, I want to remove it from Delta. What’s the best way to implement this logic?

Wiliam Rosa
Data Engineer | Machine Learning Engineer
LinkedIn: linkedin.com/in/wiliamrosa

nayan_wylde · ‎08-15-2025

The recommended way of handling CDC in Databricks is by using the merge command.

https://docs.databricks.com/aws/en/sql/language-manual/delta-merge-into

If you using SQL.

-- Delete all target rows that have a match in the source table.
> MERGE INTO target USING source
ON target.key = source.key
WHEN MATCHED THEN DELETE

-- Conditionally update target rows that have a match in the source table using the source value.
> MERGE INTO target USING source
ON target.key = source.key
WHEN MATCHED AND target.updated_at < source.updated_at THEN UPDATE SET *

-- Multiple MATCHED clauses conditionally deleting matched target rows and updating two columns for all other matched rows.
> MERGE INTO target USING source
ON target.key = source.key
WHEN MATCHED AND target.marked_for_deletion THEN DELETE
WHEN MATCHED THEN UPDATE SET target.updated_at = source.updated_at, target.value = DEFAULT

-- if you are using python,

from delta.tables import *

# Assuming 'targetTable' is a DeltaTable object and 'sourceDF' is a DataFrame
# representing the data to be merged.

deltaTable = DeltaTable.forPath(spark, "/path/to/your/delta/table")

deltaTable.merge(
sourceDF,
"target.key_column = source.key_column"
) .whenMatchedUpdateAll() .whenNotMatchedInsertAll().whenNotMatchedBySourceDelete().execute()

View solution in original post

nayan_wylde · ‎08-15-2025

The recommended way of handling CDC in Databricks is by using the merge command.

https://docs.databricks.com/aws/en/sql/language-manual/delta-merge-into

If you using SQL.

-- Delete all target rows that have a match in the source table.
> MERGE INTO target USING source
ON target.key = source.key
WHEN MATCHED THEN DELETE

-- Conditionally update target rows that have a match in the source table using the source value.
> MERGE INTO target USING source
ON target.key = source.key
WHEN MATCHED AND target.updated_at < source.updated_at THEN UPDATE SET *

-- Multiple MATCHED clauses conditionally deleting matched target rows and updating two columns for all other matched rows.
> MERGE INTO target USING source
ON target.key = source.key
WHEN MATCHED AND target.marked_for_deletion THEN DELETE
WHEN MATCHED THEN UPDATE SET target.updated_at = source.updated_at, target.value = DEFAULT

-- if you are using python,

from delta.tables import *

# Assuming 'targetTable' is a DeltaTable object and 'sourceDF' is a DataFrame
# representing the data to be merged.

deltaTable = DeltaTable.forPath(spark, "/path/to/your/delta/table")

deltaTable.merge(
sourceDF,
"target.key_column = source.key_column"
) .whenMatchedUpdateAll() .whenNotMatchedInsertAll().whenNotMatchedBySourceDelete().execute()

Databricks Community

Recommended approach for handling deletes in a Delta table

Join Us as a Local Community Builder!

Lakehouse, Lagers & Legends — Bangalore Meetup | December 13

🌟 Community Pulse: Your Weekly Roundup! November 21 – 27, 2025

Join us for another BrickTalk: Vibe-Coding Databricks Apps in Replit with Augusto!

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐