cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Dynamically change spark.task.cpus

Thor
New Contributor III

Hello,

I'm facing a problem with big tarballs to decompress and to fit in memory I had to limit Spark processing too many files at the same time so I changed the following property on my 8 cores VMs cluster:

spark.task.cpus 4 

This setting is the threshold before I get spill or OOM errors when decompressing tarballs

But, for the next stages of my pipeline, I would like to use the cluster at its maximum capacity by setting back:

spark.task.cpus 1

 Currently, as a workaround I had to store the intermediate results and read the data with an other cluster with the proper setting.

My question is: can I change dynamically spark.task.cpus for each stage or transformation?

 

Same problem with no answer:

https://stackoverflow.com/questions/40759007/dynamic-cpus-per-task-in-spark

1 ACCEPTED SOLUTION

Accepted Solutions

jose_gonzalez
Moderator
Moderator

Hi @Thor,

Spark does not offer the capability to dynamically modify configuration settings, such as spark.task.cpus, for individual stages or transformations while the application is running. Once a configuration property is set for a Spark application, it remains constant throughout its entire execution. This is a cluster setting and the only way to disabled/enabled it, will be to restart your cluster.

If you're seeking a more flexible approach for resource allocation, you could explore Spark's built-in dynamic allocation feature, denoted as spark.dynamicAllocation.enabled. This feature can be coupled with the refinement of properties like spark.executor.cores and spark.executor.memory. This combination allows Spark to automatically adapt the number of executors based on the workload. It's worth noting, however, that even with this method, you still cannot dynamically modify spark.task.cpus on a per-stage basis.

I hope this helps.

View solution in original post

1 REPLY 1

jose_gonzalez
Moderator
Moderator

Hi @Thor,

Spark does not offer the capability to dynamically modify configuration settings, such as spark.task.cpus, for individual stages or transformations while the application is running. Once a configuration property is set for a Spark application, it remains constant throughout its entire execution. This is a cluster setting and the only way to disabled/enabled it, will be to restart your cluster.

If you're seeking a more flexible approach for resource allocation, you could explore Spark's built-in dynamic allocation feature, denoted as spark.dynamicAllocation.enabled. This feature can be coupled with the refinement of properties like spark.executor.cores and spark.executor.memory. This combination allows Spark to automatically adapt the number of executors based on the workload. It's worth noting, however, that even with this method, you still cannot dynamically modify spark.task.cpus on a per-stage basis.

I hope this helps.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.