01-11-2024 09:32 AM
I'm trying to use the Global Init Scripts in Databricks to set an environment variable to use in a Delta Live Table Pipeline. I want to be able to reference a value passed in as a path versus hard coding it. Here is the code for my pipeline:
CREATE STREAMING LIVE TABLE data
COMMENT "Raw data in delta format"
TBLPROPERTIES ("quality" = "bronze", "pipelines.autoOptimize.zOrderCols" = "id")
AS
SELECT *, id FROM cloud_files(
"${TEST_VAR}data/files", "json", map("cloudFiles.inferColumnTypes", "true")
)However, when I set up a global init script like below, it doesn't appear on the list of environment variables on the job compute cluster.
#!/bin/sh sudo echo TEST_VAR=TESTING >> /etc/environment
Is this because the cluster type is PIPELINE? Is what I am attempting to do possible? Are Global Init Scripts even run when using Delta Live Table Pipelines, can can environment variables be referenced in SQL-style pipelines? I am finding little documentation about this online.
01-18-2024 07:06 AM
I was able to accomplish this by creating a Cluster Policy that put in place the scripts, config settings, and environment variables I needed.
01-12-2024 07:56 AM
Hi, Could you please check if you are using shared access cluster mode?
01-16-2024 01:22 PM
Hi @ac0 Could you please refer to the doc here https://docs.databricks.com/en/init-scripts/environment-variables.html
Also there is a doc which talks about the same topic: https://community.databricks.com/t5/data-engineering/set-environment-variables-in-global-init-script...
01-18-2024 07:06 AM
I was able to accomplish this by creating a Cluster Policy that put in place the scripts, config settings, and environment variables I needed.
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now