Re: Problems and questions with deploying Lakeflow...

AlbertWang · ‎08-01-2025

Thank you for your replay, szymon_dybczak.

After upgrading my Databricks CLIs, I could configure `schema`, `glob`, and `root_path`. I also figured out how to configure single-node cluster.

However, I still cannot figure out the reason the following problem.

5, When I create a pipeline manually on UI, the pipeline accepts the following code:

def create_scd2_table(view_name, scd2_table_name, keys, sequence_by):
    dlt.create_streaming_table(f"{catalog_silver}.{schema}.{scd2_table_name}")
    dlt.create_auto_cdc_flow(
        target=f"{catalog_silver}.{schema}.{scd2_table_name}",
        source=view_name,
        keys=keys,
        sequence_by=col(sequence_by),
        stored_as_scd_type = 2
    )

And

def create_materialized_view(scd2_table_name, scd2_materialized_view_name):
    @Dlt.table(name = f"{catalog_gold}.{schema}.{scd2_materialized_view_name}")
    def mv():
        return dlt.read(f"{catalog_silver}.{schema}.{scd2_table_name}") \
                .withColumn("is_current", col("__END_AT").isNull()) \
                    .withColumn("__END_AT",
                        when(
                            col("__END_AT").isNull(),
                            lit(MAX_END_AT)
                        ).otherwise(col("__END_AT"))
                    )

That means, I can customize where to put the streaming tables and materialized views (in which UC catalog/schema). However, the pipeline deployed via Bundles does not support these features. I cannot define the catalog and schema of streaming tables and materialized views. They must be created under the pipeline's catalog and schema.