Databricks

Leszek · ‎11-17-2021

Hi, I'm running couple of Notebooks in my pipeline and I would like to set fixed value of 'spark.sql.shuffle.partitions' - same value for every notebook. Should I do that by adding spark.conf.set.. code in each Notebook (Runtime SQL configurations are per-session) or is there any other, easier way to set this?

Hubert-Dudek · ‎11-17-2021

Easiest is to use spark config in advanced options in cluster settings:

more info here https://docs.databricks.com/clusters/configure.html#spark-configuration

View solution in original post

Hubert-Dudek · ‎11-17-2021

Easiest is to use spark config in advanced options in cluster settings:

more info here https://docs.databricks.com/clusters/configure.html#spark-configuration

Prabakar · ‎11-17-2021

Setting it either in the notebook or the cluster, both should work. But the better option would be to go with the cluster's spark config.

jose_gonzalez · ‎11-17-2021

Hi @Leszek ,

Like @Hubert Dudek mentioned, I will recommend to add your setting to the cluster Spark configurations.

The difference between notebook vs cluster, is that:

In the Notebook, the Spark configurations will only apply to the notebook Spark context itself. The Spark configurations only will apply to the notebook
In the cluster, the Spark setting will be global and it will be apply to all the notebooks that are attach to the cluster.

Leszek · ‎11-17-2021

Hi, Thank you all for the tips. I tried before to set this option in Spark Config but didn't work for some reason. Today I tried again and it's working :).

Hubert-Dudek · ‎11-18-2021

Great that it is working. Any chance to be selected as best answer? 🙂

Databricks

Runtime SQL Configuration - how to make it simple

Registration now open! Databricks Data + AI Summit 2024

Meet DBRX, the New Standard for High-Quality LLMs

Data Warehousing in the Era of AI