<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: SQL Update Join in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/sql-update-join/m-p/27440#M19312</link>
    <description>&lt;P&gt;Thanks for the answers guys, I went with &lt;A href="https://users/31623/leedabee.html" alt="https://users/31623/leedabee.html" target="_blank"&gt;@&lt;/A&gt;Lee suggestion, because we need the code to run in SQL, but I will test the python code, @Bisharath Sthapit​&amp;nbsp; provided, later on, to see if there are any performance gains.&lt;/P&gt;&lt;P&gt;Cheers,&lt;/P&gt;&lt;P&gt;Mike&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Mon, 09 Dec 2019 16:59:55 GMT</pubDate>
    <dc:creator>MikeK_</dc:creator>
    <dc:date>2019-12-09T16:59:55Z</dc:date>
    <item>
      <title>SQL Update Join</title>
      <link>https://community.databricks.com/t5/data-engineering/sql-update-join/m-p/27437#M19309</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;I'm importing some data and stored procedures from SQL Server into databricks, I noticed that updates with joins are not supported in Spark SQL, what's the alternative I can use? Here's what I'm trying to do:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;update t1
set t1.colB=CASE WHEN t2.colB&amp;gt;t1.colB THEN t2.colB ELSE t1.colB + t2.colB END
from table1 t1
inner join table2 t2 ON t1.colA=t2.ColA
where t2.colC='XYZ'&amp;lt;br&amp;gt;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Another thing I was unable to do in Spak SQL are CROSS APPLY and OUTER APPLY, are there any alternatives for those 2?&lt;/P&gt;
&lt;P&gt;Thanks in advance.&lt;/P&gt;
&lt;P&gt;Mike&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 21 Mar 2025 13:28:11 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/sql-update-join/m-p/27437#M19309</guid>
      <dc:creator>MikeK_</dc:creator>
      <dc:date>2025-03-21T13:28:11Z</dc:date>
    </item>
    <item>
      <title>Re: SQL Update Join</title>
      <link>https://community.databricks.com/t5/data-engineering/sql-update-join/m-p/27438#M19310</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Hi @Mike K.,&lt;/P&gt;
&lt;P&gt;As spark is of a different distributed architecture than traditional RDBMS, you'll certainly see that some functionality isn't supported 1:1. For your particular issue, I would suggest performing the conditional logic and filtering into one temporary structure, and then using the &lt;PRE&gt;&lt;CODE&gt;merge into&lt;/CODE&gt;&lt;/PRE&gt; statement &lt;A href="https://docs.databricks.com/spark/latest/spark-sql/language-manual/merge-into.html" target="test_blank"&gt;https://docs.databricks.com/spark/latest/spark-sql/language-manual/merge-into.html&lt;/A&gt;. This does assume that you are using the (awesome) Delta architecture and storage format, which handles the ACID-compliant type transactions under the hood. There certainly other ways to accomplish this, but here is one such way:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;create temporary view joined as
select dt1.colA, CASE WHEN dt2.colB&amp;gt;dt1.colB THEN dt2.colB ELSE dt1.colB + dt2.colB END as colB
from dt1 inner join dt2 ON dt1.colA=dt2.colA
where dt2.colC='XYZ';
merge into dt1
using joined ON dt1.colA=joined.colA
WHEN MATCHED THEN UPDATE set colB = joined.colB;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Here is my working notebook, which shows reproducing the failure and steps taken to get to this final answer: &lt;/P&gt;
&lt;P&gt;&lt;A href="https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/3249544772526824/2625620935109214/7123846766950497/latest.html" target="test_blank"&gt;https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/3249544772526824/2625620935109214/7123846766950497/latest.html&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;or on my github: &lt;A href="https://github.com/leedabee/databricks-forum-support-notebooks/tree/master/db-forum-29380" target="test_blank"&gt;https://github.com/leedabee/databricks-forum-support-notebooks/tree/master/db-forum-29380&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Hope this gives you some help!&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 04 Dec 2019 21:18:00 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/sql-update-join/m-p/27438#M19310</guid>
      <dc:creator>lee</dc:creator>
      <dc:date>2019-12-04T21:18:00Z</dc:date>
    </item>
    <item>
      <title>Re: SQL Update Join</title>
      <link>https://community.databricks.com/t5/data-engineering/sql-update-join/m-p/27439#M19311</link>
      <description>&lt;P&gt;@Mike K.​&amp;nbsp; I have added a solution without using views. &lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;temp_df = df2.join(df1, "colA", how = "left").select("*", df1.colB.alias("df1_colB")).drop(df1.colB)&lt;/CODE&gt;&lt;/PRE&gt;&lt;PRE&gt;&lt;CODE&gt;df2 = temp_df.withColumn("colB", when((col("colB") &amp;lt;= col("df1_colB")) \
                                         &amp;amp; (temp_df.df1_colB.isNotNull()) \
                                         &amp;amp; (temp_df.colC == 'XYZ'), col("colB") + col("df1_colB")).otherwise(col("colB"))) \
                                        .drop("df1_colB")&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Link to the full code in notebook w/ explanation:&lt;/P&gt;&lt;P&gt;&lt;A href="https://github.com/bsthapit/Databricks_Solutions/blob/master/db_29380.ipynb" alt="https://github.com/bsthapit/Databricks_Solutions/blob/master/db_29380.ipynb" target="_blank"&gt;https://github.com/bsthapit/Databricks_Solutions/blob/master/db_29380.ipynb&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 07 Dec 2019 23:30:42 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/sql-update-join/m-p/27439#M19311</guid>
      <dc:creator>BisharathSthapi</dc:creator>
      <dc:date>2019-12-07T23:30:42Z</dc:date>
    </item>
    <item>
      <title>Re: SQL Update Join</title>
      <link>https://community.databricks.com/t5/data-engineering/sql-update-join/m-p/27440#M19312</link>
      <description>&lt;P&gt;Thanks for the answers guys, I went with &lt;A href="https://users/31623/leedabee.html" alt="https://users/31623/leedabee.html" target="_blank"&gt;@&lt;/A&gt;Lee suggestion, because we need the code to run in SQL, but I will test the python code, @Bisharath Sthapit​&amp;nbsp; provided, later on, to see if there are any performance gains.&lt;/P&gt;&lt;P&gt;Cheers,&lt;/P&gt;&lt;P&gt;Mike&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 09 Dec 2019 16:59:55 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/sql-update-join/m-p/27440#M19312</guid>
      <dc:creator>MikeK_</dc:creator>
      <dc:date>2019-12-09T16:59:55Z</dc:date>
    </item>
    <item>
      <title>Re: SQL Update Join</title>
      <link>https://community.databricks.com/t5/data-engineering/sql-update-join/m-p/27441#M19313</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Hi!&lt;/P&gt;
&lt;P&gt;This is way late, but did you ever find a solution to the CROSS APPLY-part of your question? Is it possible to do CROSS APPLY in Spark SQL, or is there something you can use instead?&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 04 Dec 2020 12:50:33 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/sql-update-join/m-p/27441#M19313</guid>
      <dc:creator>LyderIversen</dc:creator>
      <dc:date>2020-12-04T12:50:33Z</dc:date>
    </item>
    <item>
      <title>Re: SQL Update Join</title>
      <link>https://community.databricks.com/t5/data-engineering/sql-update-join/m-p/27442#M19314</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I have a issue, where I wanted to convert SQL UPDATE with JOIN query into Merge.&lt;/P&gt;
&lt;P&gt;I do not have Primary OR Unique key in both the tables (on which join in performed). Hence getting error in merge - &lt;/P&gt;Error in SQL statement: UnsupportedOperationException: Cannot perform Merge as multiple source rows matched and attempted to modify the same target row in the Delta table in possibly conflicting ways. Can some please help me on resolving this issue? 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 03 Feb 2021 05:23:17 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/sql-update-join/m-p/27442#M19314</guid>
      <dc:creator>BhaktiAbhyankar</dc:creator>
      <dc:date>2021-02-03T05:23:17Z</dc:date>
    </item>
    <item>
      <title>Re: SQL Update Join</title>
      <link>https://community.databricks.com/t5/data-engineering/sql-update-join/m-p/50925#M28922</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/30095"&gt;@BhaktiAbhyankar&lt;/a&gt;&amp;nbsp;, im facing same kind of issue, my table doesn't have any primary keys to join,&amp;nbsp;&lt;BR /&gt;so i went with 'leftanti' and union. is this issue resolved for you? if you got any better approach. please share.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 11 Nov 2023 08:13:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/sql-update-join/m-p/50925#M28922</guid>
      <dc:creator>venkatesh547</dc:creator>
      <dc:date>2023-11-11T08:13:39Z</dc:date>
    </item>
  </channel>
</rss>

