cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Trasform SQL Cursor using Pyspark in Databricks

elgeo
Valued Contributor II

We have a Cursor in DB2 which reads in each loop data from 2 tables. At the end of each loop, after inserting the data to a target table, we update records related to each loop in these 2 tables before moving to the next loop. An indicative example is the below:

FETCH CUR1 INTO V_A1, V_A2, V_C1, V_C3, V_M1, V_M2

SELECT V_M1 FROM TABLE_1 WHERE A1=V_A1

SELECT V_M2 FROM TABLE_2 WHERE C1=V_C1

IF ..... THEN V_B1 = V_M1-V_M2 ELSE ....

INSERT INTO TARGET ... VALUES (V_A1, V_A2, ...)

UPDATE TABLE_1 SET V_M1 = V_M1 - V_B1

UPDATE TABLE_2 SET V_M2 = V_M2 - V_B1

FETCH CUR1 INTO V_A1, V_A2, V_C1, V_C3, V_M1, V_M2

END WHILE

CLOSE CUR1

Just to note that A1, C1 are not unique across the data.

Could you please suggest a way to transform it using Pyspark? Performace also matters as we speak about a large amount of data. I saw that RDDs are immutable in case we were trying RDD-map option.

Thank you in advance

2 REPLIES 2

Anonymous
Not applicable

@ELENI GEORGOUSI​ :

  1. Load the data from the two tables into PySpark DataFrames: df1 and df2
  2. Join the two DataFrames on their common columns: call it df
  3. Define a user-defined function (UDF) that implements your IF statement
  4. Add a new column to the DataFrame that computes using the UDF
  5. Insert the data into the target table
  6. Update the source tables:

Hope this helps to give you a framework on how to think and go about curser.

Anonymous
Not applicable

Hi @ELENI GEORGOUSI​ 

Thank you for posting your question in our community! We are happy to assist you.

To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?

This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance! 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group