<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Trasform SQL Cursor using Pyspark in Databricks in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/trasform-sql-cursor-using-pyspark-in-databricks/m-p/9500#M4851</link>
    <description>&lt;P&gt;@ELENI GEORGOUSI​&amp;nbsp;:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Load the data from the two tables into PySpark DataFrames: df1 and df2&lt;/LI&gt;&lt;LI&gt;Join the two DataFrames on their common columns: call it df&lt;/LI&gt;&lt;LI&gt;Define a user-defined function (UDF) that implements your IF statement&lt;/LI&gt;&lt;LI&gt;Add a new column to the DataFrame that computes using the UDF&lt;/LI&gt;&lt;LI&gt;Insert the data into the target table&lt;/LI&gt;&lt;LI&gt;Update the source tables:&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Hope this helps to give you a framework on how to think and go about curser.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 08 Mar 2023 06:52:13 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2023-03-08T06:52:13Z</dc:date>
    <item>
      <title>Trasform SQL Cursor using Pyspark in Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/trasform-sql-cursor-using-pyspark-in-databricks/m-p/9499#M4850</link>
      <description>&lt;P&gt;We have a Cursor in DB2 which reads in each loop data from 2 tables. At the end of each loop, after inserting the data to a target table, we update records related to each loop in these 2 tables before moving to the next loop. An indicative example is the below:&lt;/P&gt;&lt;P&gt;FETCH CUR1 INTO V_A1, V_A2, V_C1, V_C3, V_M1, V_M2&lt;/P&gt;&lt;P&gt;SELECT V_M1 FROM TABLE_1 WHERE A1=V_A1&lt;/P&gt;&lt;P&gt;SELECT V_M2 FROM TABLE_2 WHERE C1=V_C1&lt;/P&gt;&lt;P&gt;IF ..... THEN V_B1 = V_M1-V_M2 ELSE ....&lt;/P&gt;&lt;P&gt;INSERT INTO TARGET ... VALUES (V_A1, V_A2, ...)&lt;/P&gt;&lt;P&gt;UPDATE TABLE_1 SET V_M1 = V_M1 - V_B1&lt;/P&gt;&lt;P&gt;UPDATE TABLE_2 SET V_M2 = V_M2 - V_B1&lt;/P&gt;&lt;P&gt;FETCH CUR1 INTO V_A1, V_A2, V_C1, V_C3, V_M1, V_M2&lt;/P&gt;&lt;P&gt;END WHILE&lt;/P&gt;&lt;P&gt;CLOSE CUR1&lt;/P&gt;&lt;P&gt;Just to note that A1, C1 are not unique across the data.&lt;/P&gt;&lt;P&gt;Could you please suggest a way to transform it using Pyspark? Performace also matters as we speak about a large amount of data. I saw that RDDs are immutable in case we were trying RDD-map option.&lt;/P&gt;&lt;P&gt;Thank you in advance&lt;/P&gt;</description>
      <pubDate>Mon, 13 Feb 2023 13:07:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/trasform-sql-cursor-using-pyspark-in-databricks/m-p/9499#M4850</guid>
      <dc:creator>elgeo</dc:creator>
      <dc:date>2023-02-13T13:07:31Z</dc:date>
    </item>
    <item>
      <title>Re: Trasform SQL Cursor using Pyspark in Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/trasform-sql-cursor-using-pyspark-in-databricks/m-p/9500#M4851</link>
      <description>&lt;P&gt;@ELENI GEORGOUSI​&amp;nbsp;:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Load the data from the two tables into PySpark DataFrames: df1 and df2&lt;/LI&gt;&lt;LI&gt;Join the two DataFrames on their common columns: call it df&lt;/LI&gt;&lt;LI&gt;Define a user-defined function (UDF) that implements your IF statement&lt;/LI&gt;&lt;LI&gt;Add a new column to the DataFrame that computes using the UDF&lt;/LI&gt;&lt;LI&gt;Insert the data into the target table&lt;/LI&gt;&lt;LI&gt;Update the source tables:&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Hope this helps to give you a framework on how to think and go about curser.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 08 Mar 2023 06:52:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/trasform-sql-cursor-using-pyspark-in-databricks/m-p/9500#M4851</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2023-03-08T06:52:13Z</dc:date>
    </item>
    <item>
      <title>Re: Trasform SQL Cursor using Pyspark in Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/trasform-sql-cursor-using-pyspark-in-databricks/m-p/9501#M4852</link>
      <description>&lt;P&gt;Hi @ELENI GEORGOUSI​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thank you for posting your question in our community! We are happy to assist you.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance!&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 10 Apr 2023 10:11:21 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/trasform-sql-cursor-using-pyspark-in-databricks/m-p/9501#M4852</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2023-04-10T10:11:21Z</dc:date>
    </item>
  </channel>
</rss>

