cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Global Parameter at the Pipeline level in Lakeflow Job

Nidhig
Contributor

Hi ,
any work around or Databricks can enable global parameters feature at the pipeline level in the lakeflow job.
Currently I am working on migrating adf pipeline schedule set up to lakeflow job.

 

1 ACCEPTED SOLUTION

Accepted Solutions

mark_ott
Databricks Employee
Databricks Employee

Databricks Lakeflow Declarative Pipelines do not currently support truly global parameters at the pipeline level in the same way that Azure Data Factory (ADF) allows, but there are workarounds that enable parameterization to streamline migration from ADF pipelines to Lakeflow jobs.โ€‹

Pipeline-Level Parameterization

Lakeflow pipelines support pipeline-level parameters by allowing you to define key-value pairs in the pipeline configuration (either via the workspace UI or JSON). These parameters can be referenced within your pipeline code, making it possible to centralize and reuse values, similar to ADF's global parameters. However, these parameters are not truly global: their scope is limited to the pipeline in which they are defined, and updating their value requires editing the pipeline configuration.โ€‹

You can access pipeline parameters in SQL by referencing them as variables (e.g., ${source_catalog}) and in Python by retrieving them from Spark configuration (spark.conf.get("parameter_name")). Example configuration for a pipeline:โ€‹

json
{ "name": "Data Ingest - DEV", "configuration": { "mypipeline.startDate": "2021-01-02" } }

And in code:

python
start_date = spark.conf.get("mypipeline.startDate")

Parameter Passing Workarounds

For more dynamic scenariosโ€”such as injecting job-level parameters or making parameters available globally across multiple pipelinesโ€”the current workaround is to leverage job parameters when scheduling Lakeflow jobs via the Jobs UI or API. Job parameters allow you to pass values for use within specific job runs and orchestrate context between tasks, but you cannot yet set parameters that automatically propagate across all pipelines globally.โ€‹

Some users also utilize Databricks widgets (dbutils.widgets) in notebooks for more flexible parameter passing when integrating with ADF or orchestration tools (e.g., Airflow), which can further help mimic ADF's global parameter behavior for reusability.โ€‹

Recommendations for Migration

  • Define pipeline parameters in the Lakeflow configuration for each pipeline.

  • Pass environment-specific values (dev/prod dates, source catalog names, etc.) via configuration, and reference them in your code.

  • When scheduling jobs, use job parameters to control run-specific input values.

  • For cases requiring true global context (like environment names, retention thresholds), maintain a configuration table or file accessible by pipelines, or standardize parameter keys and values across pipeline configurations manually.

While Lakeflow makes parameter passing more SQL- and Python-friendly and allows passing SQL outputs as parameters between tasks, a built-in global parameters feature (like in ADF) is not natively available yet. Continuous updates to Lakeflow may address this limitation in the future, so monitoring the Databricks release notes and community forums is advised.โ€‹

View solution in original post

1 REPLY 1

mark_ott
Databricks Employee
Databricks Employee

Databricks Lakeflow Declarative Pipelines do not currently support truly global parameters at the pipeline level in the same way that Azure Data Factory (ADF) allows, but there are workarounds that enable parameterization to streamline migration from ADF pipelines to Lakeflow jobs.โ€‹

Pipeline-Level Parameterization

Lakeflow pipelines support pipeline-level parameters by allowing you to define key-value pairs in the pipeline configuration (either via the workspace UI or JSON). These parameters can be referenced within your pipeline code, making it possible to centralize and reuse values, similar to ADF's global parameters. However, these parameters are not truly global: their scope is limited to the pipeline in which they are defined, and updating their value requires editing the pipeline configuration.โ€‹

You can access pipeline parameters in SQL by referencing them as variables (e.g., ${source_catalog}) and in Python by retrieving them from Spark configuration (spark.conf.get("parameter_name")). Example configuration for a pipeline:โ€‹

json
{ "name": "Data Ingest - DEV", "configuration": { "mypipeline.startDate": "2021-01-02" } }

And in code:

python
start_date = spark.conf.get("mypipeline.startDate")

Parameter Passing Workarounds

For more dynamic scenariosโ€”such as injecting job-level parameters or making parameters available globally across multiple pipelinesโ€”the current workaround is to leverage job parameters when scheduling Lakeflow jobs via the Jobs UI or API. Job parameters allow you to pass values for use within specific job runs and orchestrate context between tasks, but you cannot yet set parameters that automatically propagate across all pipelines globally.โ€‹

Some users also utilize Databricks widgets (dbutils.widgets) in notebooks for more flexible parameter passing when integrating with ADF or orchestration tools (e.g., Airflow), which can further help mimic ADF's global parameter behavior for reusability.โ€‹

Recommendations for Migration

  • Define pipeline parameters in the Lakeflow configuration for each pipeline.

  • Pass environment-specific values (dev/prod dates, source catalog names, etc.) via configuration, and reference them in your code.

  • When scheduling jobs, use job parameters to control run-specific input values.

  • For cases requiring true global context (like environment names, retention thresholds), maintain a configuration table or file accessible by pipelines, or standardize parameter keys and values across pipeline configurations manually.

While Lakeflow makes parameter passing more SQL- and Python-friendly and allows passing SQL outputs as parameters between tasks, a built-in global parameters feature (like in ADF) is not natively available yet. Continuous updates to Lakeflow may address this limitation in the future, so monitoring the Databricks release notes and community forums is advised.โ€‹

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now