cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Access the environment variable from the custom container base cluster

lukaszl
New Contributor III

Hi Databricks Community,

I want to set environment variables for all clusters in my workspace. The goal is to the have environment variable, available in all notebooks executed on the cluster.

The environment variable is generated in global init script and stored in the `/etc/environment` like documented here: https://community.databricks.com/s/question/0D58Y000096UKm5SAG/set-environment-variables-in-global-i...

After my init script execution the `/etc/environment` content looks like:

CLUSTER_DB_HOME=/databricks
DATABRICKS_RUNTIME_VERSION=10.4
DB_HOME=/databricks
DEFAULT_DATABRICKS_ROOT_VIRTUALENV_ENV=/databricks/python3
MLFLOW_CONDA_HOME=/databricks/conda
MLFLOW_PYTHON_EXECUTABLE=/databricks/python/bin/python
MLFLOW_TRACKING_URI=databricks
PYARROW_IGNORE_TIMEZONE=1
export MY_TEST_VAR=test

The integration is working for the standard clusters and I can use the variable in the notebooks.

BUT for the clusters with defined custom docker container, the environment variable is invisible.

With the custom docker container cluster, I mean the clusters with the option "Use your own Docker container" set. For that type of clusters I can't access the environment variable. E.g the result of the code

import os
print(os.getenv('MY_TEST_VAR'))

is empty (None).

Any ideas where do I need to store environment variables to have them available in all cluster types?

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions

-werners-
Esteemed Contributor III

there is spark conf which you can set on cluster creation or even in the notebook.

No idea how that would work in docker though.

View solution in original post

5 REPLIES 5

-werners-
Esteemed Contributor III

I suppose in the dockerfile of your image.

lukaszl
New Contributor III

Thank you for the quick answer!

Unfortunately, I'm going to store the dynamic access token value in that environment variable. The token need to be generated per cluster and will expire in 4 hours. So my environment variable value is known only on cluster setup. That is the reason why I used the init script. So the docker image creation will be to early.

Does Databricks have the environment variable storage which will be injected before each notebook run? That will be the perfect solution for me.

-werners-
Esteemed Contributor III

there is spark conf which you can set on cluster creation or even in the notebook.

No idea how that would work in docker though.

lukaszl
New Contributor III

Thank you @Werner Stinckensโ€‹ !

Based on your suggestion I found the Databricks internal file

/databricks/spark/conf/spark-env.sh

which is containing the environment variables visible in the notebook. Adding the new variable over this file solves my problem.

grazie
Contributor

Thanks @Lukasz Luโ€‹ - that worked for me as well. When I used the following script:

#!/bin/bash
echo MY_TEST_VAR=value1 | tee -a /etc/environment >> /databricks/spark/conf/spark-env.sh
  • for non-docker clusters, MY_TEST_VAR shows up twice in ` /databricks/spark/conf/spark-env.sh`
  • for docker clusters MY_TEST_VAR shows up once
  • in both cases `os.getenv("MY_TEST_VAR")` gives value1 as expected

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group