cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Is there a way to validate the values of spark configs?

Anonymous
Not applicable

We can set for example:

spark.conf.set('aaa.test.junk.config', 99999) , and then run spark.conf.get("aaa.test.junk.configโ€) which will return a value.

The problem occurs when incorrectly setting to a similar matching property.

spark.conf.set('spark.sql.shuffle.partition', 999) ==> without the trailing โ€™s'

Where the actual property is: โ€˜spark.sql.shuffle.partitions' ==> has a training โ€™sโ€™

Running spark.conf.get('spark.sql.shuffle.partitionโ€™) will return a value ==> without the trailing โ€™s'

I thought I could run the getAll() as a validation, but the getAll() may not return properties that are explicitly defined in a Notebook session.

Is there a way to check if what I have used as config parameter is actually valid or not? I don't see any error message either

1 ACCEPTED SOLUTION

Accepted Solutions

sajith_appukutt
Honored Contributor II

You could check the list of valid configurations here and ensure that there are no typos

View solution in original post

2 REPLIES 2

sajith_appukutt
Honored Contributor II

You could check the list of valid configurations here and ensure that there are no typos

User16857281974
Contributor

You would solve this just like we solve this problem for all lose string references. Namely, that is to create a constant that represents the key-value you want to ensure doesn't get mistyped.

Naturally, if you type it wrong the first time, it will be wrong everywhere, but that is true for all software development. Beyond that, a simple assert will avoid regressions where someone might change your value.

The good news is that this has already been done for you and you could simply include it in your code if you wanted to as seen here:

Screenshot_95 

If you are using Python, I would go ahead and create your own constants and assertions give that integrating with the underlying Scala code just wouldn't be worth it (in my opinion)

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.