cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Databricks Setting Dynamic Local Configuration Properties

somedeveloper
New Contributor

It seems that Databricks is somehow setting the properties of local spark configurations for each notebook. Can someone point me to exactly how and where this is being done?

 

I would like to set the scheduler to utilize a certain pool by default, but it seems that the property is being set to some dynamic integer value - and ends up using FIFO - rather than just utilizing the default pool. The only solution currently seems to be manually setting the property itself in each notebook either through explicit code for it or importing code that does it, but I'd really prefer something I can make automatic through something like a configuration file. 

3 REPLIES 3

BigRoux
Databricks Employee
Databricks Employee

You will need to leverage cluster-level Spark configurations or global init scripts.  This will allow you to set "spark.scheduler.poo" property automatically for all workloads on the cluster.

You can try navigationg to "Compute", select the cluster you want to modify, click edit then look under "Advanced Options." Expand the "Spark" tab and add "spark.scheduler.pool=<your pool name>".

You can verify your setting by running the following command in a notebook that is attached to your cluster:

spark.sparkContext.setLocalProperty("spark.scheduler.pool", "<your pool name>"

Cheers, Louis.

Morning Louis,

I do have the global configuration set to use the scheduler pool currently, but the local property is not mirroring the global property for something reason. The global configuration is setting the scheduler pool to low, but the local property is showing a string of integers similar to: '1944239804544138415', with the value being different for every notebook. I had initially thought of using an init script to set the value, but because init scripts load before Spark is running, I'm unable to set any local properties. I'm unaware of anything we are doing on our side to set dynamically set local values, but if you could please verify that the local settings normally mirror the global settings by default, I'll try checking back with our Databricks team to make sure this isn't something due to our end. 

The behavior you're observing—where the local property for spark.scheduler.pool is being set to a dynamic integer value rather than mirroring the global configuration—is not the default behavior of Spark or Databricks. Normally, global Spark configurations (e.g., set at the cluster level) should propagate to individual sessions unless explicitly overridden. 

Is it possible there is a local overide you are not aware of?  Everything you are reporting points to something in your environment that is programmaticaly overring your global setting.  Local settings will override global settings, it is the pecking order.

Test for local settings: print(spark.sparkContext.getLocalProperty("spark.scheduler.pool"))

Test for global settings: print(spark.conf.get("spark.scheduler.pool"))

You may also want to look for any shared libraries, init scripts, or notebook templates that might include calls to "setLocalProperty"

You can also look at the cluster logs for any evidence of dynamic property assignments.

Cheers, Louis.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group