Problems and questions with deploying Lakeflow Dec...

AlbertWang · ‎07-31-2025

Hi all,

I met some problems and have some questions about deploying Lakeflow Declarative Pipeline using Databricks Bundles. Could anyone kindly help?

Below is my current bundle resource file for the pipeline:

resources:
  pipelines:
    dbr_d365_crm_pipeline:
      name: dbr_d365_crm_pipeline
      libraries:
        - file:
            path: ../src/pipeline/transformations/**
      clusters:
        - label: default
          aws_attributes: {}
          node_type_id: Standard_D4ads_v5
          driver_node_type_id: Standard_D4ads_v5
          num_workers: 0
      configuration:
        env: ${bundle.target}
        tables_config: ${var.tables_config}
      catalog: ag_dbr_ctlg_silver_${bundle.environment}
      schema: d365_crm
      continuous: false
      photon: false
      development: ${var.is_dev}
      edition: ADVANCED
      channel: CURRENT
      serverless: false

1, The field `schema` does not work. The document says:

However, when I run `databricks bundle deploy`, I got the warning "unknown field: schema" and the error "The target schema field is required for UC pipelines". After change `schema` to `target`, the deploy works.

2, Even though I set `num_workers` to `0`, the deployed pipeline's Cluster mode is still set to "Enhanced autoscaling", and default to 1 ~ 5 workers. I don't know how to configure the pipeline's Cluster mode to "Fixed size" with "0" worker using Bundles.

3, When I create the pipeline manually on UI, I can set the pipeline root folder, while I cannot do it via deploying using Bundles.

4, When I create the pipeline manually on UI, I can set the pipeline's Source code to a folder, and the corresponding YAML code shows:

libraries:
  - glob:
      include: /Workspace/Users/xxx/xxx/xxx/transformations/**

However, I cannot use `glob` in the Bundles pipeline resource file. I can only use `file` as shown in above code.

5, When I create a pipeline manually on UI, the pipeline accepts the following code:

def create_scd2_table(view_name, scd2_table_name, keys, sequence_by):
    dlt.create_streaming_table(f"{catalog_silver}.{schema}.{scd2_table_name}")
    dlt.create_auto_cdc_flow(
        target=f"{catalog_silver}.{schema}.{scd2_table_name}",
        source=view_name,
        keys=keys,
        sequence_by=col(sequence_by),
        stored_as_scd_type = 2
    )

And

def create_materialized_view(scd2_table_name, scd2_materialized_view_name):
    @Dlt.table(name = f"{catalog_gold}.{schema}.{scd2_materialized_view_name}")
    def mv():
        return dlt.read(f"{catalog_silver}.{schema}.{scd2_table_name}") \
                .withColumn("is_current", col("__END_AT").isNull()) \
                    .withColumn("__END_AT",
                        when(
                            col("__END_AT").isNull(),
                            lit(MAX_END_AT)
                        ).otherwise(col("__END_AT"))
                    )

That means, I can customize where to put the streaming tables and materialized views (in which UC catalog/schema). However, the pipeline deployed via Bundles does not support these features. I cannot define the catalog and schema of streaming tables and materialized views. They must be created under the pipeline's catalog and schema.

Can anyone help?

Thank you.

Regards,

Albert

Problems and questions with deploying Lakeflow Declarative Pipeline using Databricks Bundles