11-17-2021 07:00 AM
Hi, I'm running couple of Notebooks in my pipeline and I would like to set fixed value of 'spark.sql.shuffle.partitions' - same value for every notebook. Should I do that by adding spark.conf.set.. code in each Notebook (Runtime SQL configurations are per-session) or is there any other, easier way to set this?
11-17-2021 07:30 AM
Easiest is to use spark config in advanced options in cluster settings:
more info here https://docs.databricks.com/clusters/configure.html#spark-configuration
11-17-2021 07:30 AM
Easiest is to use spark config in advanced options in cluster settings:
more info here https://docs.databricks.com/clusters/configure.html#spark-configuration
11-17-2021 08:01 AM
Setting it either in the notebook or the cluster, both should work. But the better option would be to go with the cluster's spark config.
11-17-2021 10:21 AM
Hi @Leszek ,
Like @Hubert Dudek mentioned, I will recommend to add your setting to the cluster Spark configurations.
The difference between notebook vs cluster, is that:
11-17-2021 11:41 PM
11-18-2021 04:00 AM
Great that it is working. Any chance to be selected as best answer? 🙂
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group