cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Is there a way to validate the values of spark configs?

Anonymous
Not applicable

We can set for example:

spark.conf.set('aaa.test.junk.config', 99999) , and then run spark.conf.get("aaa.test.junk.config”) which will return a value.

The problem occurs when incorrectly setting to a similar matching property.

spark.conf.set('spark.sql.shuffle.partition', 999) ==> without the trailing ’s'

Where the actual property is: ‘spark.sql.shuffle.partitions' ==> has a training ’s’

Running spark.conf.get('spark.sql.shuffle.partition’) will return a value ==> without the trailing ’s'

I thought I could run the getAll() as a validation, but the getAll() may not return properties that are explicitly defined in a Notebook session.

Is there a way to check if what I have used as config parameter is actually valid or not? I don't see any error message either

1 ACCEPTED SOLUTION

Accepted Solutions

sajith_appukutt
Honored Contributor II

You could check the list of valid configurations here and ensure that there are no typos

View solution in original post

2 REPLIES 2

sajith_appukutt
Honored Contributor II

You could check the list of valid configurations here and ensure that there are no typos

User16857281974
Contributor

You would solve this just like we solve this problem for all lose string references. Namely, that is to create a constant that represents the key-value you want to ensure doesn't get mistyped.

Naturally, if you type it wrong the first time, it will be wrong everywhere, but that is true for all software development. Beyond that, a simple assert will avoid regressions where someone might change your value.

The good news is that this has already been done for you and you could simply include it in your code if you wanted to as seen here:

Screenshot_95 

If you are using Python, I would go ahead and create your own constants and assertions give that integrating with the underlying Scala code just wouldn't be worth it (in my opinion)

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group