- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-31-2023 01:48 AM - edited 07-31-2023 01:50 AM
Hello,
I'm facing a problem with big tarballs to decompress and to fit in memory I had to limit Spark processing too many files at the same time so I changed the following property on my 8 cores VMs cluster:
spark.task.cpus 4
This setting is the threshold before I get spill or OOM errors when decompressing tarballs
But, for the next stages of my pipeline, I would like to use the cluster at its maximum capacity by setting back:
spark.task.cpus 1
Currently, as a workaround I had to store the intermediate results and read the data with an other cluster with the proper setting.
My question is: can I change dynamically spark.task.cpus for each stage or transformation?
Same problem with no answer:
https://stackoverflow.com/questions/40759007/dynamic-cpus-per-task-in-spark
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-16-2023 10:58 AM
Hi @Thor,
Spark does not offer the capability to dynamically modify configuration settings, such as spark.task.cpus, for individual stages or transformations while the application is running. Once a configuration property is set for a Spark application, it remains constant throughout its entire execution. This is a cluster setting and the only way to disabled/enabled it, will be to restart your cluster.
If you're seeking a more flexible approach for resource allocation, you could explore Spark's built-in dynamic allocation feature, denoted as spark.dynamicAllocation.enabled. This feature can be coupled with the refinement of properties like spark.executor.cores and spark.executor.memory. This combination allows Spark to automatically adapt the number of executors based on the workload. It's worth noting, however, that even with this method, you still cannot dynamically modify spark.task.cpus on a per-stage basis.
I hope this helps.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-16-2023 10:58 AM
Hi @Thor,
Spark does not offer the capability to dynamically modify configuration settings, such as spark.task.cpus, for individual stages or transformations while the application is running. Once a configuration property is set for a Spark application, it remains constant throughout its entire execution. This is a cluster setting and the only way to disabled/enabled it, will be to restart your cluster.
If you're seeking a more flexible approach for resource allocation, you could explore Spark's built-in dynamic allocation feature, denoted as spark.dynamicAllocation.enabled. This feature can be coupled with the refinement of properties like spark.executor.cores and spark.executor.memory. This combination allows Spark to automatically adapt the number of executors based on the workload. It's worth noting, however, that even with this method, you still cannot dynamically modify spark.task.cpus on a per-stage basis.
I hope this helps.

