So, I am doing 4 spatial join operation on the files with the following sizes:
- Base_road_file which is 1gigabyte
- Telematics file which is 1.2 gigs
- state boundary file , BH road file, client_geofence file and kpmg_geofence_file which are not too large
My databricks cluster details are as follows:
13.3 LTS runtme, Standard_DS5_v2 56gb mem 16 cores for driver and worker nodes
The issue is that the joins happen within seconds but writing to a delta table is timing out my entire run>Moreover, even if I increase the time out the whole operation keeps running for hours which is not good for my client.
So, could anyone please suggest what to do. I have even tried repartition but have added optimizeWrite to my spark session settings as well but nothing seems to help. So, could anyone please suggest a way to make my write operation faster.