chanukya-pekala
Contributor III

I can suggest few tweaks in the compute, the current D series is good enough, but we are handling huge data, please try bumping up minimum workers from 1 to at least 4; change the VM type - to a bigger one - Standard_E64ds_v5, and if not try to use a memory optimized instance too, the IO is high while extracting the data from external systems, we should see which area is fast, extraction process or writing process, accordingly better recomendation can be given. Try to find a better partition keys, run a groupby statement on the records against partition keys to check for distribution setup? Few ideas, please try! 


Chanukya