01-12-2023 02:15 AM
Hi Databricks Community,
I want to set environment variables for all clusters in my workspace. The goal is to the have environment variable, available in all notebooks executed on the cluster.
The environment variable is generated in global init script and stored in the `/etc/environment` like documented here: https://community.databricks.com/s/question/0D58Y000096UKm5SAG/set-environment-variables-in-global-i...
After my init script execution the `/etc/environment` content looks like:
CLUSTER_DB_HOME=/databricks
DATABRICKS_RUNTIME_VERSION=10.4
DB_HOME=/databricks
DEFAULT_DATABRICKS_ROOT_VIRTUALENV_ENV=/databricks/python3
MLFLOW_CONDA_HOME=/databricks/conda
MLFLOW_PYTHON_EXECUTABLE=/databricks/python/bin/python
MLFLOW_TRACKING_URI=databricks
PYARROW_IGNORE_TIMEZONE=1
export MY_TEST_VAR=test
The integration is working for the standard clusters and I can use the variable in the notebooks.
BUT for the clusters with defined custom docker container, the environment variable is invisible.
With the custom docker container cluster, I mean the clusters with the option "Use your own Docker container" set. For that type of clusters I can't access the environment variable. E.g the result of the code
import os
print(os.getenv('MY_TEST_VAR'))
is empty (None).
Any ideas where do I need to store environment variables to have them available in all cluster types?
Thank you!
01-13-2023 05:48 AM
there is spark conf which you can set on cluster creation or even in the notebook.
No idea how that would work in docker though.
01-12-2023 02:26 AM
I suppose in the dockerfile of your image.
01-13-2023 03:15 AM
Thank you for the quick answer!
Unfortunately, I'm going to store the dynamic access token value in that environment variable. The token need to be generated per cluster and will expire in 4 hours. So my environment variable value is known only on cluster setup. That is the reason why I used the init script. So the docker image creation will be to early.
Does Databricks have the environment variable storage which will be injected before each notebook run? That will be the perfect solution for me.
01-13-2023 05:48 AM
there is spark conf which you can set on cluster creation or even in the notebook.
No idea how that would work in docker though.
01-16-2023 01:00 AM
Hi @Lukasz Lu, We haven’t heard from you since the last response from @Werner Stinckens , and I was checking back to see if his suggestions helped you.
Or else, If you have any solution, please share it with the community, as it can be helpful to others.
Also, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.
01-16-2023 01:19 AM
Thank you @Werner Stinckens !
Based on your suggestion I found the Databricks internal file
/databricks/spark/conf/spark-env.sh
which is containing the environment variables visible in the notebook. Adding the new variable over this file solves my problem.
01-23-2023 12:41 PM
Thanks @Lukasz Lu - that worked for me as well. When I used the following script:
#!/bin/bash
echo MY_TEST_VAR=value1 | tee -a /etc/environment >> /databricks/spark/conf/spark-env.sh
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group