topic Re: SQL Update Join in Data Engineering

SQL Update Join

MikeK_ — Fri, 21 Mar 2025 13:28:11 GMT

Hi,

I'm importing some data and stored procedures from SQL Server into databricks, I noticed that updates with joins are not supported in Spark SQL, what's the alternative I can use? Here's what I'm trying to do:

update t1
set t1.colB=CASE WHEN t2.colB>t1.colB THEN t2.colB ELSE t1.colB + t2.colB END
from table1 t1
inner join table2 t2 ON t1.colA=t2.ColA
where t2.colC='XYZ'<br>

Another thing I was unable to do in Spak SQL are CROSS APPLY and OUTER APPLY, are there any alternatives for those 2?

Thanks in advance.

Mike

Re: SQL Update Join

lee — Wed, 04 Dec 2019 21:18:00 GMT

Hi @Mike K.,

As spark is of a different distributed architecture than traditional RDBMS, you'll certainly see that some functionality isn't supported 1:1. For your particular issue, I would suggest performing the conditional logic and filtering into one temporary structure, and then using the

merge into

statement https://docs.databricks.com/spark/latest/spark-sql/language-manual/merge-into.html. This does assume that you are using the (awesome) Delta architecture and storage format, which handles the ACID-compliant type transactions under the hood. There certainly other ways to accomplish this, but here is one such way:

create temporary view joined as
select dt1.colA, CASE WHEN dt2.colB>dt1.colB THEN dt2.colB ELSE dt1.colB + dt2.colB END as colB
from dt1 inner join dt2 ON dt1.colA=dt2.colA
where dt2.colC='XYZ';
merge into dt1
using joined ON dt1.colA=joined.colA
WHEN MATCHED THEN UPDATE set colB = joined.colB;

Here is my working notebook, which shows reproducing the failure and steps taken to get to this final answer:

https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/3249544772526824/2625620935109214/7123846766950497/latest.html

or on my github: https://github.com/leedabee/databricks-forum-support-notebooks/tree/master/db-forum-29380

Hope this gives you some help!

Re: SQL Update Join

BisharathSthapi — Sat, 07 Dec 2019 23:30:42 GMT

@Mike K. I have added a solution without using views.

temp_df = df2.join(df1, "colA", how = "left").select("*", df1.colB.alias("df1_colB")).drop(df1.colB)

df2 = temp_df.withColumn("colB", when((col("colB") <= col("df1_colB")) \
                                         & (temp_df.df1_colB.isNotNull()) \
                                         & (temp_df.colC == 'XYZ'), col("colB") + col("df1_colB")).otherwise(col("colB"))) \
                                        .drop("df1_colB")

Link to the full code in notebook w/ explanation:

https://github.com/bsthapit/Databricks_Solutions/blob/master/db_29380.ipynb

Re: SQL Update Join

MikeK_ — Mon, 09 Dec 2019 16:59:55 GMT

Thanks for the answers guys, I went with @Lee suggestion, because we need the code to run in SQL, but I will test the python code, @Bisharath Sthapit provided, later on, to see if there are any performance gains.

Cheers,

Mike

Re: SQL Update Join

LyderIversen — Fri, 04 Dec 2020 12:50:33 GMT

Hi!

This is way late, but did you ever find a solution to the CROSS APPLY-part of your question? Is it possible to do CROSS APPLY in Spark SQL, or is there something you can use instead?

Re: SQL Update Join

BhaktiAbhyankar — Wed, 03 Feb 2021 05:23:17 GMT

I have a issue, where I wanted to convert SQL UPDATE with JOIN query into Merge.

I do not have Primary OR Unique key in both the tables (on which join in performed). Hence getting error in merge -

Error in SQL statement: UnsupportedOperationException: Cannot perform Merge as multiple source rows matched and attempted to modify the same target row in the Delta table in possibly conflicting ways. Can some please help me on resolving this issue?

Re: SQL Update Join

venkatesh547 — Sat, 11 Nov 2023 08:13:39 GMT

Hi @BhaktiAbhyankar , im facing same kind of issue, my table doesn't have any primary keys to join,
so i went with 'leftanti' and union. is this issue resolved for you? if you got any better approach. please share.