Hi all,
I have a postgres database that contains two tables: A and B.
Also, I have 2 delta tables, called C and D. My task is to push the data from A to C and B to D - and if something fails, then leave everything as is.
With python it is easy. Set up the connection, then create a cursor, and finally push all the data into the DB and commit at the end. Close cursor & connection.
With pyspark/spark sql this is not trivial. It looks like spark commits after each insert operation. This is not ideal because I dont want to leave any mess behind if sth fails.
An alternative solution is to maintain a temporary schema and create a postgres connection once all the data is pushed to the temp schema. Then I just call the function as is, and then if sth fails in the middle of the function, then everything remains clean.
Please advise.