cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

DLT Runtime Values

MarkV
New Contributor III

When my pipeline runs, I have a need to query a table in the pipeline before I actually create another table. I need to know the target catalog and target schema for the query. I figured the notebook might run automatically in the context of the catalog and schema configured at the pipeline level resulting in me not needing to qualify the table name with catalog and schema. However, that is not the case. I can't seem to locate a way to read in these pipeline configuration values at run-time. Is there a way to do this?

I want to do something like this at run-time of a DLT pipeline:

catalog = spark.conf.get(target_catalog)

schema = spark.conf.get(target_schema)

table_name = "a"

df = spark.sql(f"select * from {catalog}.{schema}.{table_name}")

How do I get the target_catalog and target_schema values at run-time from the pipeline? I've searched high and low for the answer but I've come up empty handed.

Any help is appreciated.

3 REPLIES 3

SparkJun
Databricks Employee
Databricks Employee

can you set up notebook parameters and pass them in the DLT pipeline? https://docs.databricks.com/en/jobs/job-parameters.html

MarkV
New Contributor III

Yes, I can. But, given that I already have these values in the pipeline configuration, it seemed repetitive to configure these same values again as parameters. And, a benefit to reading these values from the pipeline configuration (Destination section) vs job or pipeline advanced configuration parameters is that they cannot be changed in the pipeline (or not changed easily).

Is there no way to read pipeline configuration values like the destination catalog and destination schema at run-time?

MarkV
New Contributor III

Any thoughts on this? I want to read the default catalog and default schema at runtime and store them in a python variable. I want these values sourced from pipeline settings. spark.conf.getAll() does not work.

Databricks Assistant suggests the following, but this doesn't work either. The error indicates these configs don't exist:

To read the default catalog and default schema from the Lakeflow Declarative Pipeline settings into Python variables, use the following Spark configuration keys:

  • spark.databricks.sql.initial.catalog for the default catalog
  • spark.databricks.sql.initial.schema for the default schema

Here is how you can assign them to Python variables:

 
%python
default_catalog = spark.conf.get("spark.databricks.sql.initial.catalog")
default_schema = spark.conf.get("spark.databricks.sql.initial.schema")

These variables will reflect the catalog and schema set in your pipeline configuration. If you want to provide fallback values, you can use the os.getenv approach, but the Spark config is the authoritative source for pipeline settings.