Within a script, I noticed that the network connection between driver and the mounted network drives is often a huge bottleneck. It seems that the network through speed is unreasonable low for being an Azure
- Single node: Standard_DS12_v2 ยท DBR: 14.3.x-photon-scala2.12
Are there some ways how to improve upon the storing of a result to an Azure Blob storage? My current code looks like this:
joined_df.write.partitionBy("IdStation").mode("overwrite").parquet("/mnt/temp_folder")
Especially the IO wait of the CPU is more than just weird.