<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: What's the easiest way to upsert data into a table? (Azure ADLS Gen2) in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/what-s-the-easiest-way-to-upsert-data-into-a-table-azure-adls/m-p/17790#M11739</link>
    <description>&lt;P&gt;HI @Alex Netelkos​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Please check the dynamic partitions parameter, more details here: &lt;A href="https://docs.databricks.com/delta/selective-overwrite.html#dynamic-partition-overwrites" target="test_blank"&gt;https://docs.databricks.com/delta/selective-overwrite.html#dynamic-partition-overwrites&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;thanks,&lt;/P&gt;&lt;P&gt;Pat.&lt;/P&gt;</description>
    <pubDate>Thu, 08 Dec 2022 17:39:41 GMT</pubDate>
    <dc:creator>Pat</dc:creator>
    <dc:date>2022-12-08T17:39:41Z</dc:date>
    <item>
      <title>What's the easiest way to upsert data into a table? (Azure ADLS Gen2)</title>
      <link>https://community.databricks.com/t5/data-engineering/what-s-the-easiest-way-to-upsert-data-into-a-table-azure-adls/m-p/17789#M11738</link>
      <description>&lt;P&gt;I had been trying to upsert rows into a table in Azure Blob Storage (ADLS Gen 2) based on two partitions (sample code below). &lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;insert overwrite table new_clicks_table partition(client_id, mm_date)
select 
	click_id
	,user_id
	,click_timestamp_gmt
	,campaign_id
	,site_url
	,client_id
	,mm_date
from 
	old_clicks_table
where 
        mm_date between '2022-12-01' and '2022-12-07'
and 
        client_id in (123456)
;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;However, using "INSERT OVERWRITE" deleted all the data that was previously in that table, not just updating the data based on the give partitions (ie: all data in the table was deleted for client ID 123456 before 2022-12-01 and I was just left with data from 2022-12-01 thru 2022-12-07). &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;What's the easiest way to upsert this data into my table for only the specified partitions in my query and retain the rest of my data? &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Can this be done using a combination of "INSERT" and "OVERWRITE" or do I need to use "MERGE"? If the latter, can someone provide the best example of how to write this data using "MERGE"?&lt;/P&gt;</description>
      <pubDate>Thu, 08 Dec 2022 17:16:44 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-s-the-easiest-way-to-upsert-data-into-a-table-azure-adls/m-p/17789#M11738</guid>
      <dc:creator>Netty</dc:creator>
      <dc:date>2022-12-08T17:16:44Z</dc:date>
    </item>
    <item>
      <title>Re: What's the easiest way to upsert data into a table? (Azure ADLS Gen2)</title>
      <link>https://community.databricks.com/t5/data-engineering/what-s-the-easiest-way-to-upsert-data-into-a-table-azure-adls/m-p/17790#M11739</link>
      <description>&lt;P&gt;HI @Alex Netelkos​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Please check the dynamic partitions parameter, more details here: &lt;A href="https://docs.databricks.com/delta/selective-overwrite.html#dynamic-partition-overwrites" target="test_blank"&gt;https://docs.databricks.com/delta/selective-overwrite.html#dynamic-partition-overwrites&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;thanks,&lt;/P&gt;&lt;P&gt;Pat.&lt;/P&gt;</description>
      <pubDate>Thu, 08 Dec 2022 17:39:41 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-s-the-easiest-way-to-upsert-data-into-a-table-azure-adls/m-p/17790#M11739</guid>
      <dc:creator>Pat</dc:creator>
      <dc:date>2022-12-08T17:39:41Z</dc:date>
    </item>
    <item>
      <title>Re: What's the easiest way to upsert data into a table? (Azure ADLS Gen2)</title>
      <link>https://community.databricks.com/t5/data-engineering/what-s-the-easiest-way-to-upsert-data-into-a-table-azure-adls/m-p/17791#M11740</link>
      <description>&lt;P&gt;Hi Pat,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;That worked! Thanks so much for the easy to implement tip.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Is there a way for me to add that snippet to my Spark Config instead of having to put it in every single query I run?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Best wishes,&lt;/P&gt;&lt;P&gt;Netty&lt;/P&gt;</description>
      <pubDate>Thu, 08 Dec 2022 19:54:10 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-s-the-easiest-way-to-upsert-data-into-a-table-azure-adls/m-p/17791#M11740</guid>
      <dc:creator>Netty</dc:creator>
      <dc:date>2022-12-08T19:54:10Z</dc:date>
    </item>
    <item>
      <title>Re: What's the easiest way to upsert data into a table? (Azure ADLS Gen2)</title>
      <link>https://community.databricks.com/t5/data-engineering/what-s-the-easiest-way-to-upsert-data-into-a-table-azure-adls/m-p/17792#M11741</link>
      <description>&lt;P&gt;I think you should be able to set it once, at the top of your notebook for example. It's spark session parameter, it will be used until you will restart the cluster.&lt;/P&gt;</description>
      <pubDate>Thu, 08 Dec 2022 20:02:17 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-s-the-easiest-way-to-upsert-data-into-a-table-azure-adls/m-p/17792#M11741</guid>
      <dc:creator>Pat</dc:creator>
      <dc:date>2022-12-08T20:02:17Z</dc:date>
    </item>
    <item>
      <title>Re: What's the easiest way to upsert data into a table? (Azure ADLS Gen2)</title>
      <link>https://community.databricks.com/t5/data-engineering/what-s-the-easiest-way-to-upsert-data-into-a-table-azure-adls/m-p/17793#M11742</link>
      <description>&lt;P&gt;HI&amp;nbsp;&lt;A href="https://community.databricks.com/s/profile/0058Y00000BzqXOQAZ" alt="https://community.databricks.com/s/profile/0058Y00000BzqXOQAZ" target="_blank"&gt;@Netty&lt;/A&gt;&amp;nbsp;(Customer)​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Please check the dynamic partitions parameter from here:&amp;nbsp;&lt;A href="https://docs.databricks.com/delta/selective-overwrite.html#dynamic-partition-overwrites" alt="https://docs.databricks.com/delta/selective-overwrite.html#dynamic-partition-overwrites" target="_blank"&gt;https://docs.databricks.com/delta/selective-overwrite.html#dynamic-partition-overwrites&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 09 Dec 2022 05:31:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-s-the-easiest-way-to-upsert-data-into-a-table-azure-adls/m-p/17793#M11742</guid>
      <dc:creator>Rishabh-Pandey</dc:creator>
      <dc:date>2022-12-09T05:31:43Z</dc:date>
    </item>
    <item>
      <title>Re: What's the easiest way to upsert data into a table? (Azure ADLS Gen2)</title>
      <link>https://community.databricks.com/t5/data-engineering/what-s-the-easiest-way-to-upsert-data-into-a-table-azure-adls/m-p/17794#M11743</link>
      <description>&lt;P&gt;Below code might help you&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;Python-
(df.write
  .mode("overwrite")
  .option("partitionOverwriteMode", "dynamic")
  .saveAsTable("default.people10m")
)
&amp;nbsp;
SQL-
SET spark.sql.sources.partitionOverwriteMode=dynamic;
INSERT OVERWRITE TABLE default.people10m SELECT * FROM morePeople;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 09 Dec 2022 07:37:54 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-s-the-easiest-way-to-upsert-data-into-a-table-azure-adls/m-p/17794#M11743</guid>
      <dc:creator>Ajay-Pandey</dc:creator>
      <dc:date>2022-12-09T07:37:54Z</dc:date>
    </item>
  </channel>
</rss>

