We have a Cursor in DB2 which reads in each loop data from 2 tables. At the end of each loop, after inserting the data to a target table, we update records related to each loop in these 2 tables before moving to the next loop. An indicative example is the below:
FETCH CUR1 INTO V_A1, V_A2, V_C1, V_C3, V_M1, V_M2
SELECT V_M1 FROM TABLE_1 WHERE A1=V_A1
SELECT V_M2 FROM TABLE_2 WHERE C1=V_C1
IF ..... THEN V_B1 = V_M1-V_M2 ELSE ....
INSERT INTO TARGET ... VALUES (V_A1, V_A2, ...)
UPDATE TABLE_1 SET V_M1 = V_M1 - V_B1
UPDATE TABLE_2 SET V_M2 = V_M2 - V_B1
FETCH CUR1 INTO V_A1, V_A2, V_C1, V_C3, V_M1, V_M2
END WHILE
CLOSE CUR1
Just to note that A1, C1 are not unique across the data.
Could you please suggest a way to transform it using Pyspark? Performace also matters as we speak about a large amount of data. I saw that RDDs are immutable in case we were trying RDD-map option.
Thank you in advance