<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Best approach for writing/updating delta tables from python? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/best-approach-for-writing-updating-delta-tables-from-python/m-p/134672#M50167</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;We are migrating a local dash app to the Databricks infrastructure (using databricks apps and our delta lake).&lt;/P&gt;&lt;P&gt;The local app does the following (among others):&lt;/P&gt;&lt;P&gt;- takes Excel files form the end-user&lt;/P&gt;&lt;P&gt;- read in-memory and transforms to pandas dataframe&lt;/P&gt;&lt;P&gt;- Appends to a relevant table in sqlite, or updates rows if the unique id already exists. It uses sqlalchemy as support.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have been trying to replace these tasks the Databricks way, but haven't been able to do it effectively, despite the overwhelming amount of options.&lt;/P&gt;&lt;P&gt;My idea was to stay as close to the local app design and get it working, so that I can in the longer term make it more "production-ready". My local app takes under a second to finish the job, whilst my new app is quite slow, even though the apps are mostly the same. What I have tried so far and ideas:&lt;/P&gt;&lt;P&gt;-&amp;nbsp;Take excel file from user, and use&amp;nbsp;&lt;SPAN&gt;databricks-sqlalchemy or databricks-sql-connector to write/update the data to a delta table, using a sql warehouse. For&amp;nbsp;databricks-sql-connector I now am using the&amp;nbsp;INSERT into option, and it's been working okayish so far (the tables I am working with are not the biggest). I identify new rows and these are all inserted. For updates, I DELETE the old rows (matching ID with new rows) and INSERT the new rows. This feels shaky, complex and it's quite slow overall.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;- I am going to write the excel files to volumes, for data lineage and auditing purposes. I was thinking from there, I could use other tools, such as Spark. But not sure about it?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;- We already use DBT with seeds in production. If all fails, I might have to take these tools into account, although it will take more time to finish the migration project.&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Sun, 12 Oct 2025 18:37:59 GMT</pubDate>
    <dc:creator>georgemichael40</dc:creator>
    <dc:date>2025-10-12T18:37:59Z</dc:date>
    <item>
      <title>Best approach for writing/updating delta tables from python?</title>
      <link>https://community.databricks.com/t5/data-engineering/best-approach-for-writing-updating-delta-tables-from-python/m-p/134672#M50167</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;We are migrating a local dash app to the Databricks infrastructure (using databricks apps and our delta lake).&lt;/P&gt;&lt;P&gt;The local app does the following (among others):&lt;/P&gt;&lt;P&gt;- takes Excel files form the end-user&lt;/P&gt;&lt;P&gt;- read in-memory and transforms to pandas dataframe&lt;/P&gt;&lt;P&gt;- Appends to a relevant table in sqlite, or updates rows if the unique id already exists. It uses sqlalchemy as support.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have been trying to replace these tasks the Databricks way, but haven't been able to do it effectively, despite the overwhelming amount of options.&lt;/P&gt;&lt;P&gt;My idea was to stay as close to the local app design and get it working, so that I can in the longer term make it more "production-ready". My local app takes under a second to finish the job, whilst my new app is quite slow, even though the apps are mostly the same. What I have tried so far and ideas:&lt;/P&gt;&lt;P&gt;-&amp;nbsp;Take excel file from user, and use&amp;nbsp;&lt;SPAN&gt;databricks-sqlalchemy or databricks-sql-connector to write/update the data to a delta table, using a sql warehouse. For&amp;nbsp;databricks-sql-connector I now am using the&amp;nbsp;INSERT into option, and it's been working okayish so far (the tables I am working with are not the biggest). I identify new rows and these are all inserted. For updates, I DELETE the old rows (matching ID with new rows) and INSERT the new rows. This feels shaky, complex and it's quite slow overall.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;- I am going to write the excel files to volumes, for data lineage and auditing purposes. I was thinking from there, I could use other tools, such as Spark. But not sure about it?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;- We already use DBT with seeds in production. If all fails, I might have to take these tools into account, although it will take more time to finish the migration project.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 12 Oct 2025 18:37:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/best-approach-for-writing-updating-delta-tables-from-python/m-p/134672#M50167</guid>
      <dc:creator>georgemichael40</dc:creator>
      <dc:date>2025-10-12T18:37:59Z</dc:date>
    </item>
    <item>
      <title>Re: Best approach for writing/updating delta tables from python?</title>
      <link>https://community.databricks.com/t5/data-engineering/best-approach-for-writing-updating-delta-tables-from-python/m-p/134727#M50175</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/149417"&gt;@georgemichael40&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;My suggestion wud be to try using MERGE INTO for delta tables which works with connector then using delete/insert statements. This will also keep your code in SQL as you wanted. your tables are not large so this shud be sufficient otherwise we need to consider spark/delta combination.&lt;/P&gt;&lt;P&gt;e.g.&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;merge_sql&lt;/SPAN&gt; &lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;f&lt;/SPAN&gt;&lt;SPAN&gt;"""&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;MERGE INTO your_table AS target&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;USING (SELECT * FROM VALUES &lt;/SPAN&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;SPAN&gt;values&lt;/SPAN&gt;&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;SPAN&gt; AS t(id, col1, col2)) AS source&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;ON target.id = source.id&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;WHEN MATCHED THEN UPDATE SET col1 = source.col1, col2 = source.col2&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;WHEN NOT MATCHED THEN INSERT (id, col1, col2) VALUES (source.id, source.col1, source.col2)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;"""&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Mon, 13 Oct 2025 11:00:21 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/best-approach-for-writing-updating-delta-tables-from-python/m-p/134727#M50175</guid>
      <dc:creator>saurabh18cs</dc:creator>
      <dc:date>2025-10-13T11:00:21Z</dc:date>
    </item>
  </channel>
</rss>

