Network bottleneck

jenshumrich
Contributor

Within a script, I noticed that the network connection between driver and the mounted network drives is often a huge bottleneck. It seems that the network through speed is unreasonable low for being an Azure 

  • Single node: Standard_DS12_v2 · DBR: 14.3.x-photon-scala2.12

Are there some ways how to improve upon the storing of a result to an Azure Blob storage? My current code looks like this:

joined_df.write.partitionBy("IdStation").mode("overwrite").parquet("/mnt/temp_folder")
 
Especially the IO wait of the CPU is more than just weird.