Hi @Digvijay_11,
Here are answers to each of your three questions about Lakeflow Spark Declarative Pipelines (SDP):
1. RUNNING SDP PIPELINES IN PARALLEL WITH DYNAMIC PARAMETERS
SDP automatically determines the dependency graph across your table and view definitions. If two datasets do not depend on each other, the engine will execute them in parallel without any extra configuration on your part. You do not need to write explicit parallel logic.
For dynamic parameters, you define key-value pairs in the pipeline configuration (either through the UI or in your pipeline JSON definition). In Python, you reference them with spark.conf.get():
source_catalog = spark.conf.get("my_pipeline.source_catalog")
In SQL, you use the ${} syntax:
SELECT * FROM ${my_pipeline.source_catalog}.schema.table
You can set different values per environment. For example, in a JSON pipeline definition:
{
"name": "My Pipeline - DEV",
"configuration": {
"my_pipeline.source_catalog": "dev_catalog",
"my_pipeline.start_date": "2025-01-01"
}
}
Note: parameter keys can contain underscores, hyphens, periods, and alphanumeric characters. Values are always strings. Avoid using reserved Spark configuration keys.
Docs: https://docs.databricks.com/aws/en/ldp/parameters
2. JOB-LEVEL VS PIPELINE-LEVEL PARAMETERS
You are correct that when the same parameter name is defined at both the job level and the pipeline level, the pipeline-level configuration takes precedence and the job-level value gets overwritten. This is by design: the pipeline configuration is the authoritative source for pipeline parameters.
The recommended approach is to use distinct naming conventions to avoid collisions. For example, prefix your pipeline parameters with a namespace like "mypipeline." (e.g., mypipeline.env, mypipeline.source_path). This keeps them separate from any job-level parameters. If you need to pass values from a job into a pipeline task, use unique parameter names that do not overlap with the pipeline's own configuration keys.
Alternatively, if your pipeline is a task within a multi-task job and you need to propagate values from upstream tasks, consider using task values (dbutils.jobs.taskValues) in a notebook task that runs before the pipeline task, then reference those values through a separate mechanism rather than relying on overlapping parameter names.
Docs: https://docs.databricks.com/aws/en/jobs/parameters
3. DO YOU ALWAYS HAVE TO CREATE A DELTA LIVE TABLE?
No, you do not always have to create a Delta table. SDP supports three dataset types:
- Streaming tables: for incremental, append-only workloads (e.g., ingesting from cloud storage or message buses).
- Materialized views: for batch transformations that are recomputed on each pipeline update.
- Temporary views: for intermediate transformations that do not persist data. These are useful when you need a transformation step but do not want to store the result as a table.
That said, streaming tables and materialized views are backed by Delta and provide benefits like ACID transactions, time travel, and schema enforcement. Temporary views are only available within the pipeline run and are not stored. Choose the dataset type based on whether you need the data to persist and be queryable outside the pipeline.
In Python:
import dlt
@dlt.view
def my_temp_view():
return spark.read.table("source_table").filter("status = 'active'")
@dlt.table
def my_final_table():
return dlt.read("my_temp_view").groupBy("category").count()
In SQL:
CREATE TEMPORARY LIVE VIEW my_temp_view AS
SELECT * FROM source_table WHERE status = 'active';
CREATE OR REFRESH LIVE TABLE my_final_table AS
SELECT category, count(*) as cnt FROM LIVE.my_temp_view GROUP BY category;
Docs: https://docs.databricks.com/aws/en/ldp/index
* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.
If this answer resolves your question, could you mark it as "Accept as Solution"? That helps other users quickly find the correct fix.