- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-24-2026 07:16 AM
Hi isyed,
Apologies for the late response.
For our use case, we tried to change the code from pyspark dataframes to spark sql, which instead of keeping all the records into the memory, writes to the tables and then perform next loop. Ours is typical hierarchical looping done over 200 million of records. Every loop used to store in dataframe and then calculate the next hierarchy and so on, which caused the issue. After changing the logic to SQL (Insert records into temp tables every loop), the code was running faster since there's no storing data in memory as every loop, the records are written into temp tables and next loop record count is reduced.
Hope this helps.