cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Dynamically change spark.task.cpus

Thor
New Contributor III

Hello,

I'm facing a problem with big tarballs to decompress and to fit in memory I had to limit Spark processing too many files at the same time so I changed the following property on my 8 cores VMs cluster:

spark.task.cpus 4 

This setting is the threshold before I get spill or OOM errors when decompressing tarballs

But, for the next stages of my pipeline, I would like to use the cluster at its maximum capacity by setting back:

spark.task.cpus 1

 Currently, as a workaround I had to store the intermediate results and read the data with an other cluster with the proper setting.

My question is: can I change dynamically spark.task.cpus for each stage or transformation?

 

Same problem with no answer:

https://stackoverflow.com/questions/40759007/dynamic-cpus-per-task-in-spark

1 ACCEPTED SOLUTION

Accepted Solutions

jose_gonzalez
Moderator
Moderator

Hi @Thor,

Spark does not offer the capability to dynamically modify configuration settings, such as spark.task.cpus, for individual stages or transformations while the application is running. Once a configuration property is set for a Spark application, it remains constant throughout its entire execution. This is a cluster setting and the only way to disabled/enabled it, will be to restart your cluster.

If you're seeking a more flexible approach for resource allocation, you could explore Spark's built-in dynamic allocation feature, denoted as spark.dynamicAllocation.enabled. This feature can be coupled with the refinement of properties like spark.executor.cores and spark.executor.memory. This combination allows Spark to automatically adapt the number of executors based on the workload. It's worth noting, however, that even with this method, you still cannot dynamically modify spark.task.cpus on a per-stage basis.

I hope this helps.

View solution in original post

1 REPLY 1

jose_gonzalez
Moderator
Moderator

Hi @Thor,

Spark does not offer the capability to dynamically modify configuration settings, such as spark.task.cpus, for individual stages or transformations while the application is running. Once a configuration property is set for a Spark application, it remains constant throughout its entire execution. This is a cluster setting and the only way to disabled/enabled it, will be to restart your cluster.

If you're seeking a more flexible approach for resource allocation, you could explore Spark's built-in dynamic allocation feature, denoted as spark.dynamicAllocation.enabled. This feature can be coupled with the refinement of properties like spark.executor.cores and spark.executor.memory. This combination allows Spark to automatically adapt the number of executors based on the workload. It's worth noting, however, that even with this method, you still cannot dynamically modify spark.task.cpus on a per-stage basis.

I hope this helps.

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!