cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Set environment variables in global init scripts

Marcel
New Contributor III

Hi Databricks Community,

I want to set environment variables for all clusters in my workspace.

The goal is to have environment (dev, prod) specific environment variables values.

Instead of set the environment variables for each cluster, a global script is desired.

I tried different scripts like

  • export VARIABLE_NAME=VARIABLE_VALUE
  • echo VARIABLE_NAME=VARIABLE_VALUE >> /etc/environment

but environment variables are not available via

  • var my_variable = os.environ["VARIABLE_NAME"]
  • var my_variable = os.getenv("VARIABLE_NAME")

The global init script did not fail.

Any ideas?

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions

Marcel
New Contributor III

Solution:

#!/bin/bash
echo export VARIABLE_NAME=VARIABLE_VALUE >> /etc/environment

The Shebang is required.

The environment variable can then be accessed via (Python):

my_variable = os.environ["VARIABLE_NAME"]

View solution in original post

4 REPLIES 4

Pholo
Contributor

have you tried the widgets utils from databricks?

https://docs.microsoft.com/en-us/azure/databricks/notebooks/widgets

dbutils.widgets.text("database", "dev")
 
database_def = dbutils.widgets.getArgument("database")
print(database_def)

You can parametrize it when you run the scripts.

Marcel
New Contributor III

The cluster teminated after adding the first line to the global init script.

I think the widget utitliy is only for notebooks.

Doesn't the global init script have to be a valid sh/bash script?

Marcel
New Contributor III

Solution:

#!/bin/bash
echo export VARIABLE_NAME=VARIABLE_VALUE >> /etc/environment

The Shebang is required.

The environment variable can then be accessed via (Python):

my_variable = os.environ["VARIABLE_NAME"]

brickster
New Contributor II

We have set the env variable at Global Init script as below,

sudo echo DATAENV=DEV >> /etc/environment

and we try to access the variable in notebook that run with "Shared" cluster mode.

import os
print(os.getenv("DATAENV"))

But the env variable is not accessible and getting value as "None".

However, when run the notebook with "No Isolation Shared" cluster, then the env variable is accessed successfully.

Any idea, is there any restriction in accessing environment variables at cluster mode level?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group