cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Simple append only in DLT

andreacfm
New Contributor

I am facing an issue trying to find a way to insert some computed rows into a table in the context of a dlt pipeline.

My use case is extremely simple. Moving from bronze to silver I update several tables using a mix of streaming and materialized tables. I need also to write some rows into a silver table. I do not have a stream source to read as the rows are computed/aggregated. I tried the append_flow but that expects a stream source that I cannot provide. Using once=true works but as the param says it works only once. 

It seems extremely weird that DB does not support a so simple use case. Am I missing something?

 
1 ACCEPTED SOLUTION

Accepted Solutions

Louis_Frolio
Databricks Employee
Databricks Employee

Greetings @andreacfm

Youโ€™re not missing a thing. What youโ€™re seeing is a known limitation in how DLT/Lakeflow pipelines handle append_flow. It really does expect a streaming source, and the once=True flag only fires during the first run of the pipeline or when you explicitly ask for a full refresh. So your observations line up with the intended behavior.

Letโ€™s dig into the practical workarounds that teams typically use when they need to insert computed rows without a true streaming source.

Workarounds for inserting computed rows

Materialized views for aggregations

For silver-layer rollups or derived metrics, materialized views are the cleanest pattern. Theyโ€™re designed for batch-style transforms, and you get predictable refresh behavior without pretending your data came from a stream.

Example:

@dlt.table(name="silver.aggregated_metrics")
def aggregated_metrics():
    return (
        dlt.read("bronze.source_table")
            .groupBy("dimension")
            .agg(sum("metric").alias("total"))
    )

Static-lookup table pattern

If youโ€™re dealing with reference data โ€” static rows, small business logic tables, etc. โ€” define the table outside of a streaming context. DLT will treat it as a materialized view, and you can join it with streaming tables using stream-to-static joins. This preserves lineage and avoids bending the streaming model in ways it doesnโ€™t support.

One-time batch append with once=True

If your logic truly needs to run only during backfills or initial ingestions, then yes, append_flow with once=True is the intended pattern. But it wonโ€™t fire repeatedly โ€” which is exactly what youโ€™ve run into.

Why the limitation exists

DLT is deliberately built around declarative, incremental processing. It expects data to come from sources it can track over time. Arbitrary row insertion without a traceable upstream source breaks that model โ€” especially around incremental refresh, lineage, and reproducibility. So the framework simply doesnโ€™t allow it unless you wrap the logic in a source-driven or MV-driven construct.

Hope this helps clarify the landscape.

Regards, Louis.

View solution in original post

1 REPLY 1

Louis_Frolio
Databricks Employee
Databricks Employee

Greetings @andreacfm

Youโ€™re not missing a thing. What youโ€™re seeing is a known limitation in how DLT/Lakeflow pipelines handle append_flow. It really does expect a streaming source, and the once=True flag only fires during the first run of the pipeline or when you explicitly ask for a full refresh. So your observations line up with the intended behavior.

Letโ€™s dig into the practical workarounds that teams typically use when they need to insert computed rows without a true streaming source.

Workarounds for inserting computed rows

Materialized views for aggregations

For silver-layer rollups or derived metrics, materialized views are the cleanest pattern. Theyโ€™re designed for batch-style transforms, and you get predictable refresh behavior without pretending your data came from a stream.

Example:

@dlt.table(name="silver.aggregated_metrics")
def aggregated_metrics():
    return (
        dlt.read("bronze.source_table")
            .groupBy("dimension")
            .agg(sum("metric").alias("total"))
    )

Static-lookup table pattern

If youโ€™re dealing with reference data โ€” static rows, small business logic tables, etc. โ€” define the table outside of a streaming context. DLT will treat it as a materialized view, and you can join it with streaming tables using stream-to-static joins. This preserves lineage and avoids bending the streaming model in ways it doesnโ€™t support.

One-time batch append with once=True

If your logic truly needs to run only during backfills or initial ingestions, then yes, append_flow with once=True is the intended pattern. But it wonโ€™t fire repeatedly โ€” which is exactly what youโ€™ve run into.

Why the limitation exists

DLT is deliberately built around declarative, incremental processing. It expects data to come from sources it can track over time. Arbitrary row insertion without a traceable upstream source breaks that model โ€” especially around incremental refresh, lineage, and reproducibility. So the framework simply doesnโ€™t allow it unless you wrap the logic in a source-driven or MV-driven construct.

Hope this helps clarify the landscape.

Regards, Louis.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now