cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Make environment variables defined in init script available to Spark JVM job?

Rahul2025
New Contributor III

Hi,

We're using Databricks Runtime version 11.3LTS and executing a Spark Java Job using a Job Cluster. To automate the execution of this job, we need to define (source in from bash config files) some environment variables through an init script (cluster-scoped) and make them available to the Spark Java job.

The defined or sourced in environment variables get set in the init script, however they don't become available to the Spark Java job. We tried updating the /etc/environment file as stated at the following location, but that didn't help us -

https://community.databricks.com/s/topic/0TO3f000000Ciy8GAC/variables

Please let us know if we're missing anything here. Any help to get this addressed will be highly appreciated.

Thanks in advance!

Regards,

// Rahul

4 REPLIES 4

Anonymous
Not applicable

To make environment variables defined in an init script available to a Spark JVM job, you can pass the environment variables to the Spark job as command line arguments or system properties.

Here's an example of passing environment variables as command line arguments:

spark-submit --master yarn --deploy-mode client \

--conf spark.driver.extraJavaOptions=-Denv_var_1=$env_var_1 \

--conf spark.executor.extraJavaOptions=-Denv_var_2=$env_var_2 \

/path/to/your/SparkJob.jar

Rahul2025
New Contributor III

Thank you for your response.

This is my understanding. Please correct me if I'm wrong. When Spark job is triggered (using spark-submit or Jar option), either from UI or using REST API, configured init scripts are executed first on the cluster node(s) before executing the Spark job. So in init scripts, we don't have an option to provide spark-submit command line arguments. We've dynamically populated environment variables in the configuration files that need to be sourced in from init scripts and made available to the Spark job. The sourcing in part is working fine, but these environment variables are not becoming available to the Java Spark Job.

Rahul2025
New Contributor III

Thank you for your response. One of the other options is to put them in the following file from init script.

/databricks/spark/conf/

Anonymous
Not applicable

Hi @Rahul K​ 

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. 

We'd love to hear from you.

Thanks!

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.