cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Update Delta Table with Apache Spark connector

mr3
New Contributor

Hi everyone. I'd like to ask a question about updating Delta tables using the Apache Spark connector.

Let's say I have two tables: one is a product dimension table with items from my shop, and the other contains a single column with the IDs of the products I want to update. According to the documentation (Delta Update), Iโ€™ve used a merge operation as follows:

deltaTable
  .as("current")
  .merge(
    updates.as("updates"),
    "current.product_id = updates.product_id"
  )
  .whenMatched
  .updateExpr(Map("description" -> "null"))
  .execute()

My question is: is it okay to use the merge operation strictly for updates, or is it recommended to always include an insert clause as well? Also, is it possible to use the 'update' operation described in the documentation to modify one table based on another?

2 REPLIES 2

-werners-
Esteemed Contributor III

You can definitely use it for updates only (so no inserts). However, like that you will never see new products in your target table, hence the inserts. (a new dimension id being created in the dim table which needs to be created in the other table).
If that is ok for you, then you can use update only.

About your second question: I am quite not sure what you mean, but the merge only needs a few things:
source data (called 'updates' in your example), target data ('current') and a merge condition.

updates is a dataframe which can be a table.

Or perhaps you want to use CDC on the source table to determine what has to be changed in the target table?
That is certainly possible if the source table is a delta lake table and has change data feed enabled.

szymon_dybczak
Esteemed Contributor III

Hi @mr3 ,

Yes, itโ€™s perfectly fine to use a MERGE operation solely for updates. The UPDATE statement has many limitations. It doesn't support neither UPDATE FROM nor subqueries. This creates many limitations. There are situations where we would like to update records based on the result of one query, either in a different table or in the same table. 

MERGE is purpose-built for this kind of operation when you want to update Delta table based on another.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now