Hello,
I'm facing a problem with big tarballs to decompress and to fit in memory I had to limit Spark processing too many files at the same time so I changed the following property on my 8 cores VMs cluster:
spark.task.cpus 4
This setting is the threshold before I get spill or OOM errors when decompressing tarballs
But, for the next stages of my pipeline, I would like to use the cluster at its maximum capacity by setting back:
spark.task.cpus 1
Currently, as a workaround I had to store the intermediate results and read the data with an other cluster with the proper setting.
My question is: can I change dynamically spark.task.cpus for each stage or transformation?
Same problem with no answer:
https://stackoverflow.com/questions/40759007/dynamic-cpus-per-task-in-spark