Databricks Community

jenshumrich · ‎09-17-2024

Within a script, I noticed that the network connection between driver and the mounted network drives is often a huge bottleneck. It seems that the network through speed is unreasonable low for being an Azure

Single node: Standard_DS12_v2 · DBR: 14.3.x-photon-scala2.12

Are there some ways how to improve upon the storing of a result to an Azure Blob storage? My current code looks like this:

joined_df.write.partitionBy("IdStation").mode("overwrite").parquet("/mnt/temp_folder")

Especially the IO wait of the CPU is more than just weird.

filipniziol · ‎09-17-2024

Hi @jenshumrich ,

There is partitioning by IdStation. How many partitions are created? Isn't it a problem with too many files?
The partition size should around 1 GB and the file size should be or around 128 MB.

I see a lot of IO wait, so this would go in line with my suspicion that too many files are created.

View solution in original post

jenshumrich · ‎09-17-2024

Here you can see the really slow network traffic, causing iowait on the CPU

filipniziol · ‎09-17-2024

Hi @jenshumrich ,

There is partitioning by IdStation. How many partitions are created? Isn't it a problem with too many files?
The partition size should around 1 GB and the file size should be or around 128 MB.

I see a lot of IO wait, so this would go in line with my suspicion that too many files are created.

ZoeCole · ‎10-09-2024

Thank you.

jenshumrich · ‎09-18-2024

You are right. I am creating 200 small files with the size of roughly 6 MB (in the quality system) and a few 100000s files in production. The partition is motivated by the original business need and further processing. Let me test with a the different partitioning.

Databricks Community

Network bottleneck

Databricks Community Champion - June 2026 - Amira Bedhiafi

DAIS 2026 Brought 2,800 New Members to the Databricks Community - Welcome Aboard

🌟 Community Pulse: Your Weekly Roundup! June 15 – 21, 2026

Solution Accelerator Series | Creating Brand-Aligned Images Using Generative AI

Build apps without jumping through hoops