- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ09-17-2024 04:02 AM
Within a script, I noticed that the network connection between driver and the mounted network drives is often a huge bottleneck. It seems that the network through speed is unreasonable low for being an Azure
- Single node: Standard_DS12_v2 ยท DBR: 14.3.x-photon-scala2.12
Are there some ways how to improve upon the storing of a result to an Azure Blob storage? My current code looks like this:
- Labels:
-
Spark
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ09-17-2024 10:19 AM - edited โ09-17-2024 10:20 AM
Hi @jenshumrich ,
There is partitioning by IdStation. How many partitions are created? Isn't it a problem with too many files?
The partition size should around 1 GB and the file size should be or around 128 MB.
I see a lot of IO wait, so this would go in line with my suspicion that too many files are created.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ09-17-2024 04:03 AM
โ
โHere you can see the really slow network traffic, causing iowait on the CPU
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ09-17-2024 10:19 AM - edited โ09-17-2024 10:20 AM
Hi @jenshumrich ,
There is partitioning by IdStation. How many partitions are created? Isn't it a problem with too many files?
The partition size should around 1 GB and the file size should be or around 128 MB.
I see a lot of IO wait, so this would go in line with my suspicion that too many files are created.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ10-09-2024 11:54 PM
Thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ09-18-2024 04:04 AM
You are right. I am creating 200 small files with the size of roughly 6 MB (in the quality system) and a few 100000s files in production. The partition is motivated by the original business need and further processing. Let me test with a the different partitioning.

