cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

DLT Merge tables into Delta

JothyGanesan
New Contributor II

We are trying to load a Delta table from streaming tables using DLT. This target table needs a MERGE of 3 source tables. But when we use the DLT command with merge it says Merge is not supported. Is this anything related to DLT version? Please help us on this

3 REPLIES 3

Alberto_Umana
Databricks Employee
Databricks Employee

Hi @JothyGanesan,

Delta Live Tables (DLT) currently does not support the MERGE operation directly within a DLT pipeline. This limitation is not related to the DLT version but rather a general restriction in the functionality of DLT.

 

However, you can achieve the desired outcome by using a combination of foreachBatch and MERGE within a streaming query. Here is an example of how you can use foreachBatch to perform a MERGE operation

 

from delta.tables import *

 

def upsert_to_delta(microBatchOutputDF, batchId):

    deltaTable = DeltaTable.forName(spark, "target_table_name")

    deltaTable.alias("t").merge(

        microBatchOutputDF.alias("s"),

        "s.key = t.key"

    ).whenMatchedUpdateAll().whenNotMatchedInsertAll().execute()

 

streamingDF.writeStream.foreachBatch(upsert_to_delta).outputMode("update").start()

 

In this example, upsert_to_delta is a function that performs the MERGE operation using the Delta Lake APIs. The foreachBatch method is used to apply this function to each micro-batch of the streaming DataFrame

JothyGanesan
New Contributor II

@Alberto_Umana Thank you for the quick reply. But how are we to use the above, this looks like structured streaming with CDF mode.

But currently our tables being in Unity catalog, finding the start version and end version is taking huge time as the tables are near real time data. So, we wanted to read the source using DLT instead of CDF mode. So, reading the source with DLT, how should we use the above option of ForEachBatch and Merge? Could you please guide?

RiyazAli
Valued Contributor II

Hey @JothyGanesan 
Please take a look at the Apply Changes API - https://docs.databricks.com/en/delta-live-tables/cdc.html
This is a replacement of MERGE INTO in Databricks.

Cheers!

Riz

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group