Databricks

Rahul2025 · ‎02-02-2023

Hi,

We're using Databricks Runtime version 11.3LTS and executing a Spark Java Job using a Job Cluster. To automate the execution of this job, we need to define (source in from bash config files) some environment variables through an init script (cluster-scoped) and make them available to the Spark Java job.

The defined or sourced in environment variables get set in the init script, however they don't become available to the Spark Java job. We tried updating the /etc/environment file as stated at the following location, but that didn't help us -

https://community.databricks.com/s/topic/0TO3f000000Ciy8GAC/variables

Please let us know if we're missing anything here. Any help to get this addressed will be highly appreciated.

Thanks in advance!

Regards,

// Rahul

Anonymous · ‎02-03-2023

To make environment variables defined in an init script available to a Spark JVM job, you can pass the environment variables to the Spark job as command line arguments or system properties.

Here's an example of passing environment variables as command line arguments:

spark-submit --master yarn --deploy-mode client \

--conf spark.driver.extraJavaOptions=-Denv_var_1=$env_var_1 \

--conf spark.executor.extraJavaOptions=-Denv_var_2=$env_var_2 \

/path/to/your/SparkJob.jar

Rahul2025 · ‎02-03-2023

Thank you for your response.

This is my understanding. Please correct me if I'm wrong. When Spark job is triggered (using spark-submit or Jar option), either from UI or using REST API, configured init scripts are executed first on the cluster node(s) before executing the Spark job. So in init scripts, we don't have an option to provide spark-submit command line arguments. We've dynamically populated environment variables in the configuration files that need to be sourced in from init scripts and made available to the Spark job. The sourcing in part is working fine, but these environment variables are not becoming available to the Java Spark Job.

Rahul2025 · ‎02-13-2023

Thank you for your response. One of the other options is to put them in the following file from init script.

/databricks/spark/conf/

Anonymous · ‎04-10-2023

Hi @Rahul K

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help.

We'd love to hear from you.

Thanks!

Databricks

Make environment variables defined in init script available to Spark JVM job?

Registration now open! Databricks Data + AI Summit 2024

Meet DBRX, the New Standard for High-Quality LLMs

Meet the Community Team Virtually!

Data Warehousing in the Era of AI

Submit your feedback and win a $25 gift card!