โ11-17-2021 07:00 AM
Hi, I'm running couple of Notebooks in my pipeline and I would like to set fixed value of 'spark.sql.shuffle.partitions' - same value for every notebook. Should I do that by adding spark.conf.set.. code in each Notebook (Runtime SQL configurations are per-session) or is there any other, easier way to set this?
โ11-17-2021 07:30 AM
Easiest is to use spark config in advanced options in cluster settings:
more info here https://docs.databricks.com/clusters/configure.html#spark-configuration
โ11-17-2021 07:30 AM
Easiest is to use spark config in advanced options in cluster settings:
more info here https://docs.databricks.com/clusters/configure.html#spark-configuration
โ11-17-2021 08:01 AM
Setting it either in the notebook or the cluster, both should work. But the better option would be to go with the cluster's spark config.
โ11-17-2021 10:21 AM
Hi @Leszekโ ,
Like @Hubert Dudekโ mentioned, I will recommend to add your setting to the cluster Spark configurations.
The difference between notebook vs cluster, is that:
โ11-17-2021 11:41 PM
โ11-18-2021 04:00 AM
Great that it is working. Any chance to be selected as best answer? ๐
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group