Databricks Community

MarkV · ‎11-12-2024

When my pipeline runs, I have a need to query a table in the pipeline before I actually create another table. I need to know the target catalog and target schema for the query. I figured the notebook might run automatically in the context of the catalog and schema configured at the pipeline level resulting in me not needing to qualify the table name with catalog and schema. However, that is not the case. I can't seem to locate a way to read in these pipeline configuration values at run-time. Is there a way to do this?

I want to do something like this at run-time of a DLT pipeline:

catalog = spark.conf.get(target_catalog)

schema = spark.conf.get(target_schema)

table_name = "a"

df = spark.sql(f"select * from {catalog}.{schema}.{table_name}")

How do I get the target_catalog and target_schema values at run-time from the pipeline? I've searched high and low for the answer but I've come up empty handed.

Any help is appreciated.

SparkJun · ‎11-12-2024

can you set up notebook parameters and pass them in the DLT pipeline? https://docs.databricks.com/en/jobs/job-parameters.html

MarkV · ‎11-13-2024

Yes, I can. But, given that I already have these values in the pipeline configuration, it seemed repetitive to configure these same values again as parameters. And, a benefit to reading these values from the pipeline configuration (Destination section) vs job or pipeline advanced configuration parameters is that they cannot be changed in the pipeline (or not changed easily).

Is there no way to read pipeline configuration values like the destination catalog and destination schema at run-time?