cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Set environment variables in global init scripts

Marcel
New Contributor III

Hi Databricks Community,

I want to set environment variables for all clusters in my workspace.

The goal is to have environment (dev, prod) specific environment variables values.

Instead of set the environment variables for each cluster, a global script is desired.

I tried different scripts like

  • export VARIABLE_NAME=VARIABLE_VALUE
  • echo VARIABLE_NAME=VARIABLE_VALUE >> /etc/environment

but environment variables are not available via

  • var my_variable = os.environ["VARIABLE_NAME"]
  • var my_variable = os.getenv("VARIABLE_NAME")

The global init script did not fail.

Any ideas?

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions

Marcel
New Contributor III

Solution:

#!/bin/bash
echo export VARIABLE_NAME=VARIABLE_VALUE >> /etc/environment

The Shebang is required.

The environment variable can then be accessed via (Python):

my_variable = os.environ["VARIABLE_NAME"]

View solution in original post

4 REPLIES 4

Pholo
Contributor

have you tried the widgets utils from databricks?

https://docs.microsoft.com/en-us/azure/databricks/notebooks/widgets

dbutils.widgets.text("database", "dev")
 
database_def = dbutils.widgets.getArgument("database")
print(database_def)

You can parametrize it when you run the scripts.

Marcel
New Contributor III

The cluster teminated after adding the first line to the global init script.

I think the widget utitliy is only for notebooks.

Doesn't the global init script have to be a valid sh/bash script?

Marcel
New Contributor III

Solution:

#!/bin/bash
echo export VARIABLE_NAME=VARIABLE_VALUE >> /etc/environment

The Shebang is required.

The environment variable can then be accessed via (Python):

my_variable = os.environ["VARIABLE_NAME"]

brickster
New Contributor II

We have set the env variable at Global Init script as below,

sudo echo DATAENV=DEV >> /etc/environment

and we try to access the variable in notebook that run with "Shared" cluster mode.

import os
print(os.getenv("DATAENV"))

But the env variable is not accessible and getting value as "None".

However, when run the notebook with "No Isolation Shared" cluster, then the env variable is accessed successfully.

Any idea, is there any restriction in accessing environment variables at cluster mode level?

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.