- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-31-2025 09:29 PM
Hi all,
I met some problems and have some questions about deploying Lakeflow Declarative Pipeline using Databricks Bundles. Could anyone kindly help?
Below is my current bundle resource file for the pipeline:
resources:
pipelines:
dbr_d365_crm_pipeline:
name: dbr_d365_crm_pipeline
libraries:
- file:
path: ../src/pipeline/transformations/**
clusters:
- label: default
aws_attributes: {}
node_type_id: Standard_D4ads_v5
driver_node_type_id: Standard_D4ads_v5
num_workers: 0
configuration:
env: ${bundle.target}
tables_config: ${var.tables_config}
catalog: ag_dbr_ctlg_silver_${bundle.environment}
schema: d365_crm
continuous: false
photon: false
development: ${var.is_dev}
edition: ADVANCED
channel: CURRENT
serverless: false
1, The field `schema` does not work. The document says:
However, when I run `databricks bundle deploy`, I got the warning "unknown field: schema" and the error "The target schema field is required for UC pipelines". After change `schema` to `target`, the deploy works.
2, Even though I set `num_workers` to `0`, the deployed pipeline's Cluster mode is still set to "Enhanced autoscaling", and default to 1 ~ 5 workers. I don't know how to configure the pipeline's Cluster mode to "Fixed size" with "0" worker using Bundles.
3, When I create the pipeline manually on UI, I can set the pipeline root folder, while I cannot do it via deploying using Bundles.
4, When I create the pipeline manually on UI, I can set the pipeline's Source code to a folder, and the corresponding YAML code shows:
libraries:
- glob:
include: /Workspace/Users/xxx/xxx/xxx/transformations/**
However, I cannot use `glob` in the Bundles pipeline resource file. I can only use `file` as shown in above code.
5, When I create a pipeline manually on UI, the pipeline accepts the following code:
def create_scd2_table(view_name, scd2_table_name, keys, sequence_by):
dlt.create_streaming_table(f"{catalog_silver}.{schema}.{scd2_table_name}")
dlt.create_auto_cdc_flow(
target=f"{catalog_silver}.{schema}.{scd2_table_name}",
source=view_name,
keys=keys,
sequence_by=col(sequence_by),
stored_as_scd_type = 2
)
And
def create_materialized_view(scd2_table_name, scd2_materialized_view_name):
@Dlt.table(name = f"{catalog_gold}.{schema}.{scd2_materialized_view_name}")
def mv():
return dlt.read(f"{catalog_silver}.{schema}.{scd2_table_name}") \
.withColumn("is_current", col("__END_AT").isNull()) \
.withColumn("__END_AT",
when(
col("__END_AT").isNull(),
lit(MAX_END_AT)
).otherwise(col("__END_AT"))
)
That means, I can customize where to put the streaming tables and materialized views (in which UC catalog/schema). However, the pipeline deployed via Bundles does not support these features. I cannot define the catalog and schema of streaming tables and materialized views. They must be created under the pipeline's catalog and schema.
Can anyone help?
Thank you.
Regards,
Albert