<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How can I insert into 2 tables within one database transaction with spark SQL / pyspark? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-can-i-insert-into-2-tables-within-one-database-transaction/m-p/38537#M26666</link>
    <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/5427"&gt;@thomasthomas&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;What I would do is using the &lt;A href="https://docs.databricks.com/sql/language-manual/delta-restore.html" target="_self"&gt;RESTORE&lt;/A&gt; function to rollback in case of a failure.&lt;BR /&gt;It would work like this:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;from pyspark.sql.functions import max as _max, col

tgt_table_name = "catalog.schema.tbl_name"

# Get current table version
ver_df = (
   spark.sql(f"DESCRIBE HISTORY {tgt_table_name}")
        .select(_max(col("version")).alias("version"))
)

tbl_ver = df.collect()[0].version

try:
   # Your code to transfer data here

except Exception:
   spark.sql(f"RESTORE TABLE {tgt_table_name} TO VERSION AS OF {tbl_ver}")
   raise Exception(f"Load of {tgt_table_name} failed. Restored to {tbl_ver}")&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Thu, 27 Jul 2023 06:24:25 GMT</pubDate>
    <dc:creator>daniel_sahal</dc:creator>
    <dc:date>2023-07-27T06:24:25Z</dc:date>
    <item>
      <title>How can I insert into 2 tables within one database transaction with spark SQL / pyspark?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-can-i-insert-into-2-tables-within-one-database-transaction/m-p/38498#M26647</link>
      <description>&lt;P&gt;Hi all,&lt;BR /&gt;&lt;BR /&gt;I have a postgres database that contains two tables: A and B.&lt;BR /&gt;&lt;BR /&gt;Also, I have 2 delta tables, called C and D. My task is to push the data from A to C and B to D - and if something fails, then leave everything as is.&lt;BR /&gt;&lt;BR /&gt;With python it is easy. Set up the connection, then create a cursor, and finally push all the data into the DB and commit at the end. Close cursor &amp;amp; connection.&lt;BR /&gt;&lt;BR /&gt;With pyspark/spark sql this is not trivial. It looks like spark commits after each insert operation. This is not ideal because I dont want to leave any mess behind if sth fails.&lt;BR /&gt;&lt;BR /&gt;An alternative solution is to maintain a temporary schema and create a postgres connection once all the data is pushed to the temp schema. Then I just call the function as is, and then if sth fails in the middle of the function, then everything remains clean.&lt;BR /&gt;&lt;BR /&gt;Please advise.&lt;/P&gt;</description>
      <pubDate>Wed, 26 Jul 2023 13:08:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-can-i-insert-into-2-tables-within-one-database-transaction/m-p/38498#M26647</guid>
      <dc:creator>thomasthomas</dc:creator>
      <dc:date>2023-07-26T13:08:13Z</dc:date>
    </item>
    <item>
      <title>Re: How can I insert into 2 tables within one database transaction with spark SQL / pyspark?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-can-i-insert-into-2-tables-within-one-database-transaction/m-p/38537#M26666</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/5427"&gt;@thomasthomas&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;What I would do is using the &lt;A href="https://docs.databricks.com/sql/language-manual/delta-restore.html" target="_self"&gt;RESTORE&lt;/A&gt; function to rollback in case of a failure.&lt;BR /&gt;It would work like this:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;from pyspark.sql.functions import max as _max, col

tgt_table_name = "catalog.schema.tbl_name"

# Get current table version
ver_df = (
   spark.sql(f"DESCRIBE HISTORY {tgt_table_name}")
        .select(_max(col("version")).alias("version"))
)

tbl_ver = df.collect()[0].version

try:
   # Your code to transfer data here

except Exception:
   spark.sql(f"RESTORE TABLE {tgt_table_name} TO VERSION AS OF {tbl_ver}")
   raise Exception(f"Load of {tgt_table_name} failed. Restored to {tbl_ver}")&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 27 Jul 2023 06:24:25 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-can-i-insert-into-2-tables-within-one-database-transaction/m-p/38537#M26666</guid>
      <dc:creator>daniel_sahal</dc:creator>
      <dc:date>2023-07-27T06:24:25Z</dc:date>
    </item>
    <item>
      <title>Re: How can I insert into 2 tables within one database transaction with spark SQL / pyspark?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-can-i-insert-into-2-tables-within-one-database-transaction/m-p/38547#M26670</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/5427"&gt;@thomasthomas&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;We haven't heard from you since the last response from &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/79106"&gt;@daniel_sahal&lt;/a&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;​, and I was checking back to see if her suggestions helped you.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Or else, If you have any solution, please share it with the community, as it can be helpful to others.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Also, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 27 Jul 2023 08:01:53 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-can-i-insert-into-2-tables-within-one-database-transaction/m-p/38547#M26670</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2023-07-27T08:01:53Z</dc:date>
    </item>
    <item>
      <title>Re: How can I insert into 2 tables within one database transaction with spark SQL / pyspark?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-can-i-insert-into-2-tables-within-one-database-transaction/m-p/38554#M26673</link>
      <description>&lt;P&gt;As I described above, I am trying to write the content of 2 delta tables to 2 Postgres tables with an insert statement either with Spark SQL or Pyspark.&lt;BR /&gt;&lt;BR /&gt;Restore to version/describe statement are valid statement when you work with a delta table. Otherwise they dont work.&lt;BR /&gt;&lt;BR /&gt;@Anonymous&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/79106"&gt;@daniel_sahal&lt;/a&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 27 Jul 2023 09:40:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-can-i-insert-into-2-tables-within-one-database-transaction/m-p/38554#M26673</guid>
      <dc:creator>thomasthomas</dc:creator>
      <dc:date>2023-07-27T09:40:06Z</dc:date>
    </item>
    <item>
      <title>Re: How can I insert into 2 tables within one database transaction with spark SQL / pyspark?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-can-i-insert-into-2-tables-within-one-database-transaction/m-p/38654#M26698</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/5427"&gt;@thomasthomas&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;Ah, sorry. I've misunderstood your question.&lt;BR /&gt;&lt;BR /&gt;In this case it's a good way to do it the way you describe - setup sth like "staging" tables and push the data there. After all is done - merge it with the actual table.&lt;/P&gt;</description>
      <pubDate>Fri, 28 Jul 2023 10:35:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-can-i-insert-into-2-tables-within-one-database-transaction/m-p/38654#M26698</guid>
      <dc:creator>daniel_sahal</dc:creator>
      <dc:date>2023-07-28T10:35:13Z</dc:date>
    </item>
  </channel>
</rss>

