11-17-2021 07:00 AM
Hi, I'm running couple of Notebooks in my pipeline and I would like to set fixed value of 'spark.sql.shuffle.partitions' - same value for every notebook. Should I do that by adding spark.conf.set.. code in each Notebook (Runtime SQL configurations are per-session) or is there any other, easier way to set this?
11-17-2021 07:30 AM
Easiest is to use spark config in advanced options in cluster settings:
more info here https://docs.databricks.com/clusters/configure.html#spark-configuration
11-17-2021 07:30 AM
Easiest is to use spark config in advanced options in cluster settings:
more info here https://docs.databricks.com/clusters/configure.html#spark-configuration
11-17-2021 08:01 AM
Setting it either in the notebook or the cluster, both should work. But the better option would be to go with the cluster's spark config.
11-17-2021 10:21 AM
Hi @Leszek ,
Like @Hubert Dudek mentioned, I will recommend to add your setting to the cluster Spark configurations.
The difference between notebook vs cluster, is that:
11-17-2021 11:41 PM
11-18-2021 04:00 AM
Great that it is working. Any chance to be selected as best answer? 🙂
Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.
Click here to register and join today!
Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.