cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Dynamically change spark.task.cpus

Thor
New Contributor III

Hello,

I'm facing a problem with big tarballs to decompress and to fit in memory I had to limit Spark processing too many files at the same time so I changed the following property on my 8 cores VMs cluster:

spark.task.cpus 4 

This setting is the threshold before I get spill or OOM errors when decompressing tarballs

But, for the next stages of my pipeline, I would like to use the cluster at its maximum capacity by setting back:

spark.task.cpus 1

 Currently, as a workaround I had to store the intermediate results and read the data with an other cluster with the proper setting.

My question is: can I change dynamically spark.task.cpus for each stage or transformation?

 

Same problem with no answer:

https://stackoverflow.com/questions/40759007/dynamic-cpus-per-task-in-spark

1 ACCEPTED SOLUTION

Accepted Solutions

jose_gonzalez
Databricks Employee
Databricks Employee

Hi @Thor,

Spark does not offer the capability to dynamically modify configuration settings, such as spark.task.cpus, for individual stages or transformations while the application is running. Once a configuration property is set for a Spark application, it remains constant throughout its entire execution. This is a cluster setting and the only way to disabled/enabled it, will be to restart your cluster.

If you're seeking a more flexible approach for resource allocation, you could explore Spark's built-in dynamic allocation feature, denoted as spark.dynamicAllocation.enabled. This feature can be coupled with the refinement of properties like spark.executor.cores and spark.executor.memory. This combination allows Spark to automatically adapt the number of executors based on the workload. It's worth noting, however, that even with this method, you still cannot dynamically modify spark.task.cpus on a per-stage basis.

I hope this helps.

View solution in original post

1 REPLY 1

jose_gonzalez
Databricks Employee
Databricks Employee

Hi @Thor,

Spark does not offer the capability to dynamically modify configuration settings, such as spark.task.cpus, for individual stages or transformations while the application is running. Once a configuration property is set for a Spark application, it remains constant throughout its entire execution. This is a cluster setting and the only way to disabled/enabled it, will be to restart your cluster.

If you're seeking a more flexible approach for resource allocation, you could explore Spark's built-in dynamic allocation feature, denoted as spark.dynamicAllocation.enabled. This feature can be coupled with the refinement of properties like spark.executor.cores and spark.executor.memory. This combination allows Spark to automatically adapt the number of executors based on the workload. It's worth noting, however, that even with this method, you still cannot dynamically modify spark.task.cpus on a per-stage basis.

I hope this helps.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group