cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to provide env variables to a pipeline task

yit337
Contributor

Hello,

I create a job with one pipeline task through DAB. Now I want to provide a variable to it, but it is dynamic based on the target environment. As pipeline tasks do not support widgets, how can I provide this variable to the pipeline?

1 ACCEPTED SOLUTION

Accepted Solutions

SteveOstrowski
Databricks Employee
Databricks Employee

Hi @yit337,

Since pipeline tasks in Databricks Jobs do not support widgets the way notebook tasks do, the recommended approach is to use pipeline configuration parameters. These are key-value pairs you set in the pipeline definition, and you can make them dynamic per environment using Databricks Asset Bundle (DAB) variables and target overrides.

Here is how it works end to end:


STEP 1: DEFINE VARIABLES IN YOUR BUNDLE

In your databricks.yml, declare a variable with a default value and override it per target:

variables:
my_env_setting:
description: "Environment-specific setting for my pipeline"
default: "dev_value"

targets:
dev:
default: true
variables:
my_env_setting: "dev_value"
staging:
variables:
my_env_setting: "staging_value"
prod:
variables:
my_env_setting: "prod_value"


STEP 2: PASS THE VARIABLE INTO YOUR PIPELINE CONFIGURATION

In your pipeline resource definition, reference the bundle variable using the ${var.<name>} syntax inside the configuration block:

resources:
pipelines:
my_pipeline:
name: "my-pipeline"
configuration:
"my_env_setting": "${var.my_env_setting}"
libraries:
- notebook:
path: ./my_notebook.py


STEP 3: READ THE PARAMETER IN YOUR PIPELINE CODE

In Python, use spark.conf.get() to retrieve the value:

from pyspark import pipelines as dp

@DP.table
def my_table():
env_setting = spark.conf.get("my_env_setting")
# Use env_setting in your logic
return spark.read.table(f"{env_setting}.my_schema.my_source_table")

In SQL, use the ${} template syntax:

CREATE OR REFRESH MATERIALIZED VIEW my_view AS
SELECT *
FROM ${my_env_setting}.my_schema.my_source_table


IMPORTANT NOTES

1. Parameter keys can only contain alphanumeric characters, underscores (_), hyphens (-), and dots (.).
2. Parameter values are always set as strings.
3. You can also pass variables at deployment time from the CLI:
databricks bundle deploy --var="my_env_setting=custom_value"
4. If you prefer, you can also set overrides in a file at .databricks/bundle/<target>/variable-overrides.json.

DOCUMENTATION REFERENCES

- Use parameters with pipelines: https://docs.databricks.com/aws/en/delta-live-tables/parameters.html
- Databricks Asset Bundle variables: https://docs.databricks.com/aws/en/dev-tools/bundles/variables.html
- Configure a pipeline: https://docs.databricks.com/aws/en/delta-live-tables/configure-pipeline.html

This combination of DAB variables (for environment-specific values) and pipeline configuration parameters (for runtime access in code) gives you a clean way to handle environment-driven configuration without widgets.

* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.

View solution in original post

8 REPLIES 8

saurabh18cs
Honored Contributor III
UNDER resources folder you can add these 2 files :
file1.yml
variables:
  my_dynamic_variable:
    description: The name of the application.
    type: string
    default: app_name
  environment:
    description: The environment in which the application is running.
    type: string
    default: dev
file2.yml
 
<this is just an example extracted from entire bundle yml file>
tasks:
  - task_key: my_pipeline_task
    pipeline_task:
      pipeline_id: ${resources.pipelines.my_pipeline.id}
    parameters:
      environment: ${var.environment}
      app_name: ${var.my_dynamic_variable}
 
databricks.yml can include folder having those 2 files specified above:
include:
  - ./resources/*.yml


# In your DLT pipeline notebook/file
environment = spark.conf.get("environment")
dynamic_var = spark.conf.get("app_name")

These are the job parameters. Can I provide just for the pipeline task?

IM_01
Contributor II

I believe you can use configuration under settings in LDP and provide key and value pairs use key as parameter in code

pradeep_singh
Contributor

You will have to pass it as configuration and then use it in pipeline code using spark.conf.get . 

pradeep_singh_0-1769922979884.png

and then in your pipeline notebook code you use . 

demo_catalog = spark.conf.get(demo_catalog);

Thank You
Pradeep Singh - https://www.linkedin.com/in/dbxdev

Does it have to be static? Or could it be dynamic from provided from another task?

szymon_dybczak
Esteemed Contributor III

Hi @yit337 ,

As @IM_01 and @pradeep_singh  mentioned you can use configuration under settings in LDP and provide key and value pairs. Then you can refer to those parameters using spark.conf.get(your_parameter_name) in pyspark or using ${} notation in  SQL (as in example screenshot below):

szymon_dybczak_0-1769939315649.png

 

But those parameters are static. If you really want to provide a dynamic values there's an ugly workaround. You can use databricks cli to override parameters as suggested in following reply:

szymon_dybczak_0-1769938885514.png

Below you have a link to entire discussion:

Triggering DLT Pipelines with Dynamic Parameters - Databricks Community - 111581

pradeep_singh
Contributor

If you’re looking to build a dynamic, configuration-driven DLT pipeline, a better approach is to use a configuration table. This table should include fields such as table_name, pipeline_name, table_properties, and other relevant settings. Your notebook can then query this table, applying filters for the table and pipeline names that are passed dynamically through variables. The resolved properties can then be accessed directly within your code.

You can always update the parameters and have things dynamic by updating this table . 

Thank You
Pradeep Singh - https://www.linkedin.com/in/dbxdev

SteveOstrowski
Databricks Employee
Databricks Employee

Hi @yit337,

Since pipeline tasks in Databricks Jobs do not support widgets the way notebook tasks do, the recommended approach is to use pipeline configuration parameters. These are key-value pairs you set in the pipeline definition, and you can make them dynamic per environment using Databricks Asset Bundle (DAB) variables and target overrides.

Here is how it works end to end:


STEP 1: DEFINE VARIABLES IN YOUR BUNDLE

In your databricks.yml, declare a variable with a default value and override it per target:

variables:
my_env_setting:
description: "Environment-specific setting for my pipeline"
default: "dev_value"

targets:
dev:
default: true
variables:
my_env_setting: "dev_value"
staging:
variables:
my_env_setting: "staging_value"
prod:
variables:
my_env_setting: "prod_value"


STEP 2: PASS THE VARIABLE INTO YOUR PIPELINE CONFIGURATION

In your pipeline resource definition, reference the bundle variable using the ${var.<name>} syntax inside the configuration block:

resources:
pipelines:
my_pipeline:
name: "my-pipeline"
configuration:
"my_env_setting": "${var.my_env_setting}"
libraries:
- notebook:
path: ./my_notebook.py


STEP 3: READ THE PARAMETER IN YOUR PIPELINE CODE

In Python, use spark.conf.get() to retrieve the value:

from pyspark import pipelines as dp

@DP.table
def my_table():
env_setting = spark.conf.get("my_env_setting")
# Use env_setting in your logic
return spark.read.table(f"{env_setting}.my_schema.my_source_table")

In SQL, use the ${} template syntax:

CREATE OR REFRESH MATERIALIZED VIEW my_view AS
SELECT *
FROM ${my_env_setting}.my_schema.my_source_table


IMPORTANT NOTES

1. Parameter keys can only contain alphanumeric characters, underscores (_), hyphens (-), and dots (.).
2. Parameter values are always set as strings.
3. You can also pass variables at deployment time from the CLI:
databricks bundle deploy --var="my_env_setting=custom_value"
4. If you prefer, you can also set overrides in a file at .databricks/bundle/<target>/variable-overrides.json.

DOCUMENTATION REFERENCES

- Use parameters with pipelines: https://docs.databricks.com/aws/en/delta-live-tables/parameters.html
- Databricks Asset Bundle variables: https://docs.databricks.com/aws/en/dev-tools/bundles/variables.html
- Configure a pipeline: https://docs.databricks.com/aws/en/delta-live-tables/configure-pipeline.html

This combination of DAB variables (for environment-specific values) and pipeline configuration parameters (for runtime access in code) gives you a clean way to handle environment-driven configuration without widgets.

* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.