Databricks Community

AlbertWang · ‎07-31-2025

Hi all,

I met some problems and have some questions about deploying Lakeflow Declarative Pipeline using Databricks Bundles. Could anyone kindly help?

Below is my current bundle resource file for the pipeline:

resources:
  pipelines:
    dbr_d365_crm_pipeline:
      name: dbr_d365_crm_pipeline
      libraries:
        - file:
            path: ../src/pipeline/transformations/**
      clusters:
        - label: default
          aws_attributes: {}
          node_type_id: Standard_D4ads_v5
          driver_node_type_id: Standard_D4ads_v5
          num_workers: 0
      configuration:
        env: ${bundle.target}
        tables_config: ${var.tables_config}
      catalog: ag_dbr_ctlg_silver_${bundle.environment}
      schema: d365_crm
      continuous: false
      photon: false
      development: ${var.is_dev}
      edition: ADVANCED
      channel: CURRENT
      serverless: false

1, The field `schema` does not work. The document says:

However, when I run `databricks bundle deploy`, I got the warning "unknown field: schema" and the error "The target schema field is required for UC pipelines". After change `schema` to `target`, the deploy works.

2, Even though I set `num_workers` to `0`, the deployed pipeline's Cluster mode is still set to "Enhanced autoscaling", and default to 1 ~ 5 workers. I don't know how to configure the pipeline's Cluster mode to "Fixed size" with "0" worker using Bundles.

3, When I create the pipeline manually on UI, I can set the pipeline root folder, while I cannot do it via deploying using Bundles.

4, When I create the pipeline manually on UI, I can set the pipeline's Source code to a folder, and the corresponding YAML code shows:

libraries:
  - glob:
      include: /Workspace/Users/xxx/xxx/xxx/transformations/**

However, I cannot use `glob` in the Bundles pipeline resource file. I can only use `file` as shown in above code.

5, When I create a pipeline manually on UI, the pipeline accepts the following code:

def create_scd2_table(view_name, scd2_table_name, keys, sequence_by):
    dlt.create_streaming_table(f"{catalog_silver}.{schema}.{scd2_table_name}")
    dlt.create_auto_cdc_flow(
        target=f"{catalog_silver}.{schema}.{scd2_table_name}",
        source=view_name,
        keys=keys,
        sequence_by=col(sequence_by),
        stored_as_scd_type = 2
    )

And

def create_materialized_view(scd2_table_name, scd2_materialized_view_name):
    @Dlt.table(name = f"{catalog_gold}.{schema}.{scd2_materialized_view_name}")
    def mv():
        return dlt.read(f"{catalog_silver}.{schema}.{scd2_table_name}") \
                .withColumn("is_current", col("__END_AT").isNull()) \
                    .withColumn("__END_AT",
                        when(
                            col("__END_AT").isNull(),
                            lit(MAX_END_AT)
                        ).otherwise(col("__END_AT"))
                    )

That means, I can customize where to put the streaming tables and materialized views (in which UC catalog/schema). However, the pipeline deployed via Bundles does not support these features. I cannot define the catalog and schema of streaming tables and materialized views. They must be created under the pipeline's catalog and schema.

Can anyone help?

Thank you.

Regards,

Albert

szymon_dybczak · ‎08-01-2025

Hi @AlbertWang ,

Cool that most of the issues has been resolved by upgrading DAB to newer version. Regarding last error, it's a bit weird. It should work. Check if you have everything configured according to below article:

Publish to Multiple Catalogs and Schemas from a Single DLT Pipeline | Databricks Blog

So, make sure that you're using schema (not target) in your pipeline. Also, in below thread one user suggested to check if DPM setting is enabled in your pipeline. It's worth checking.

Solved: Delta Live Tables: dynamic schema - Databricks Community - 57626

In your pipeline setting you should have pipelines.enableDPMForExistingPipeline enabled to true

Enable the default publishing mode in a pipeline - Azure Databricks | Microsoft Learn

View solution in original post

szymon_dybczak · ‎08-01-2025

Hi @AlbertWang ,

I think some of those issues could be related to your databricks assets bundle version. For example the glob thing is in Beta. It could be available in UI, but not in your version of databricks cli.

The same applies for root_path:

As of num_of_workers issue. A multi-node compute resource can't be scaled to 0 workers. Use single node compute instead. (Compute configuration reference | Databricks Documentation)

Maybe you have outdated databricks cli version? Then it would also explain error with unknown schema field . For you outdated cli version this field would be unknown.

AlbertWang · ‎08-01-2025

Thank you for your replay, szymon_dybczak.

After upgrading my Databricks CLIs, I could configure `schema`, `glob`, and `root_path`. I also figured out how to configure single-node cluster.

However, I still cannot figure out the reason the following problem.

5, When I create a pipeline manually on UI, the pipeline accepts the following code:

def create_scd2_table(view_name, scd2_table_name, keys, sequence_by):
    dlt.create_streaming_table(f"{catalog_silver}.{schema}.{scd2_table_name}")
    dlt.create_auto_cdc_flow(
        target=f"{catalog_silver}.{schema}.{scd2_table_name}",
        source=view_name,
        keys=keys,
        sequence_by=col(sequence_by),
        stored_as_scd_type = 2
    )

And

def create_materialized_view(scd2_table_name, scd2_materialized_view_name):
    @Dlt.table(name = f"{catalog_gold}.{schema}.{scd2_materialized_view_name}")
    def mv():
        return dlt.read(f"{catalog_silver}.{schema}.{scd2_table_name}") \
                .withColumn("is_current", col("__END_AT").isNull()) \
                    .withColumn("__END_AT",
                        when(
                            col("__END_AT").isNull(),
                            lit(MAX_END_AT)
                        ).otherwise(col("__END_AT"))
                    )

That means, I can customize where to put the streaming tables and materialized views (in which UC catalog/schema). However, the pipeline deployed via Bundles does not support these features. I cannot define the catalog and schema of streaming tables and materialized views. They must be created under the pipeline's catalog and schema.

szymon_dybczak · ‎08-01-2025

Hi @AlbertWang ,

Cool that most of the issues has been resolved by upgrading DAB to newer version. Regarding last error, it's a bit weird. It should work. Check if you have everything configured according to below article:

Publish to Multiple Catalogs and Schemas from a Single DLT Pipeline | Databricks Blog

So, make sure that you're using schema (not target) in your pipeline. Also, in below thread one user suggested to check if DPM setting is enabled in your pipeline. It's worth checking.

Solved: Delta Live Tables: dynamic schema - Databricks Community - 57626

In your pipeline setting you should have pipelines.enableDPMForExistingPipeline enabled to true

Enable the default publishing mode in a pipeline - Azure Databricks | Microsoft Learn

AlbertWang · ‎08-01-2025

I really appreciate your kind help, szymon_dybczak!

After using `schema`, everything works now.

szymon_dybczak · ‎08-01-2025

Great, really happy that it worked for you. Thanks for accepting answer as a solution!

Databricks Community

Problems and questions with deploying Lakeflow Declarative Pipeline using Databricks Bundles

Join Us as a Local Community Builder!

Join us for another BrickTalk: Vibe-Coding Databricks Apps in Replit with Augusto!

🌟 Community Pulse: Your Weekly Roundup! November 14 – 20, 2025

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples