topic Re: How to implement MERGE operations in Lakeflow Declarative Pipelines in Data Engineering

How to implement MERGE operations in Lakeflow Declarative Pipelines

yit — Mon, 29 Sep 2025 12:49:31 GMT

Hey everyone,

We’ve been using Autoloader extensively for a while, and now we’re looking to transition to full Lakeflow Declarative Pipelines. From what I’ve researched, the reader part seems straightforward and clear.

For the writer, I understand that I can use a sink and provide the necessary options. What I’m not fully clear on is how to implement the MERGE logic. In my current Autoloader setup, I handle this via forEachBatch.

How should this be approached in the Lakeflow Declarative Pipelines framework? Could I use forEachBatch? I did not find any documentation on the topic.

Thanks in advance!

Re: How to implement MERGE operations in Lakeflow Declarative Pipelines

saurabh18cs — Mon, 29 Sep 2025 14:06:27 GMT

Hi @yit Lakeflow supports upsert/merge semantics natively for Delta tables unlile ForEachBatch

Instead of writing custom forEachBatch code, you declare the merge keys and update logic in your pipeline configuration.Lakeflow will automatically generate the necessary MERGE statements and handle upserts for you.

e.g.

sinks:
my_delta_sink:
type: delta
path: /mnt/delta/my_table
merge:
keys: ["id"] # columns to match for upsert
whenMatched: update
whenNotMatched: insert

Re: How to implement MERGE operations in Lakeflow Declarative Pipelines

de-qrosh — Wed, 15 Apr 2026 12:27:45 GMT

Could you please provide an actual sample how to do this.

Re: How to implement MERGE operations in Lakeflow Declarative Pipelines

nayan_wylde — Wed, 15 Apr 2026 19:17:51 GMT

Use APPLY CHANGES INTO (SQL) or dlt.apply_changes() (Python). This is the declarative replacement for foreachBatch MERGE logic in pipelines

import dlt from pyspark.sql.functions import col @dlt.table(name="bronze_events") def bronze_events(): return (spark.readStream .format("cloudFiles") .option("cloudFiles.format", "json") .load("abfss://container@account.dfs.core.windows.net/events/")) dlt.apply_changes( target="customer", source="bronze_events", keys=["customer_id"], sequence_by=col("event_ts"), apply_as_deletes=col("op") == "DELETE", stored_as_scd_type=1 )