11-17-2021 07:00 AM
Hi, I'm running couple of Notebooks in my pipeline and I would like to set fixed value of 'spark.sql.shuffle.partitions' - same value for every notebook. Should I do that by adding spark.conf.set.. code in each Notebook (Runtime SQL configurations are per-session) or is there any other, easier way to set this?
11-17-2021 07:30 AM
Easiest is to use spark config in advanced options in cluster settings:
more info here https://docs.databricks.com/clusters/configure.html#spark-configuration
11-17-2021 07:30 AM
Easiest is to use spark config in advanced options in cluster settings:
more info here https://docs.databricks.com/clusters/configure.html#spark-configuration
11-17-2021 08:01 AM
Setting it either in the notebook or the cluster, both should work. But the better option would be to go with the cluster's spark config.
11-17-2021 10:21 AM
Hi @Leszek ,
Like @Hubert Dudek mentioned, I will recommend to add your setting to the cluster Spark configurations.
The difference between notebook vs cluster, is that:
11-17-2021 11:41 PM
Hi, Thank you all for the tips. I tried before to set this option in Spark Config but didn't work for some reason. Today I tried again and it's working :).
11-18-2021 04:00 AM
Great that it is working. Any chance to be selected as best answer? 🙂
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now