DLT Runtime Values
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-12-2024 02:03 PM
When my pipeline runs, I have a need to query a table in the pipeline before I actually create another table. I need to know the target catalog and target schema for the query. I figured the notebook might run automatically in the context of the catalog and schema configured at the pipeline level resulting in me not needing to qualify the table name with catalog and schema. However, that is not the case. I can't seem to locate a way to read in these pipeline configuration values at run-time. Is there a way to do this?
I want to do something like this at run-time of a DLT pipeline:
catalog = spark.conf.get(target_catalog)
schema = spark.conf.get(target_schema)
table_name = "a"
df = spark.sql(f"select * from {catalog}.{schema}.{table_name}")
How do I get the target_catalog and target_schema values at run-time from the pipeline? I've searched high and low for the answer but I've come up empty handed.
Any help is appreciated.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-12-2024 10:07 PM
can you set up notebook parameters and pass them in the DLT pipeline? https://docs.databricks.com/en/jobs/job-parameters.html
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-13-2024 02:05 AM
Yes, I can. But, given that I already have these values in the pipeline configuration, it seemed repetitive to configure these same values again as parameters. And, a benefit to reading these values from the pipeline configuration (Destination section) vs job or pipeline advanced configuration parameters is that they cannot be changed in the pipeline (or not changed easily).
Is there no way to read pipeline configuration values like the destination catalog and destination schema at run-time?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-04-2025 08:54 AM
Any thoughts on this? I want to read the default catalog and default schema at runtime and store them in a python variable. I want these values sourced from pipeline settings. spark.conf.getAll() does not work.
Databricks Assistant suggests the following, but this doesn't work either. The error indicates these configs don't exist:
To read the default catalog and default schema from the Lakeflow Declarative Pipeline settings into Python variables, use the following Spark configuration keys:
- spark.databricks.sql.initial.catalog for the default catalog
- spark.databricks.sql.initial.schema for the default schema
Here is how you can assign them to Python variables:
default_catalog = spark.conf.get("spark.databricks.sql.initial.catalog")
default_schema = spark.conf.get("spark.databricks.sql.initial.schema")
These variables will reflect the catalog and schema set in your pipeline configuration. If you want to provide fallback values, you can use the os.getenv approach, but the Spark config is the authoritative source for pipeline settings.