โ03-22-2026 03:08 AM
Hi,
I was going through the documentation on quarantining records. Initially I thought that partitioning is not supported for temporary tables however I came cross the following
@DP.table(
temporary=True,
partition_cols=["is_quarantined"],
)
@dp.expect_all(rules)
def trips_data_quarantine():
return (
spark.readStream.table("raw_trips_data").withColumn("is_quarantined", expr(quarantine_rules))
)I'm unable to understand how partitioning is been applied for a temporary table. Could anyone please explain this๐
โ03-22-2026 02:26 PM
Hi @IM_01,
Good question. It's the terminology (temporary) that is probably causing the confusion.
In Declarative Pipelines, @dp.table(temporary=True, ...) creates a real Delta table on disk, just like a normal table. The only difference is visibility. It is not published to Unity Catalog or the Hive Meta Store. It is accessible only inside that pipeline. It also persists across pipeline runs for the lifetime of the pipeline, not just for a single run/session. Because itโs still a proper Delta table under the hood, all the physical table features apply... including partition_cols. The engine creates an internal table in the pipelineโs internal schema with the same partitioning as for a nonโtemporary table. This is also explained in a technical blog here.
In your example, trips_data_quarantine is a physically partitioned Delta table on is_quarantined (so reads from it can prune on that column). This is private to the pipeline (not queryable from outside, not visible in UC) and is cleaned up when the pipeline is deleted.
Partitioning is fully honoured. Temporary here means pipelineโprivate, not ephemeral in-memory temp view.
Does this answer your question?
If this answer resolves your question, could you mark it as โAccept as Solutionโ? That helps other users quickly find the correct fix.
โ03-22-2026 02:26 PM
Hi @IM_01,
Good question. It's the terminology (temporary) that is probably causing the confusion.
In Declarative Pipelines, @dp.table(temporary=True, ...) creates a real Delta table on disk, just like a normal table. The only difference is visibility. It is not published to Unity Catalog or the Hive Meta Store. It is accessible only inside that pipeline. It also persists across pipeline runs for the lifetime of the pipeline, not just for a single run/session. Because itโs still a proper Delta table under the hood, all the physical table features apply... including partition_cols. The engine creates an internal table in the pipelineโs internal schema with the same partitioning as for a nonโtemporary table. This is also explained in a technical blog here.
In your example, trips_data_quarantine is a physically partitioned Delta table on is_quarantined (so reads from it can prune on that column). This is private to the pipeline (not queryable from outside, not visible in UC) and is cleaned up when the pipeline is deleted.
Partitioning is fully honoured. Temporary here means pipelineโprivate, not ephemeral in-memory temp view.
Does this answer your question?
If this answer resolves your question, could you mark it as โAccept as Solutionโ? That helps other users quickly find the correct fix.
โ03-23-2026 03:51 AM
Hi
Thanks for the response & for sharing the article.
I was going through the article, had a doubt - view is a logical construct so it gives results when we read it and as per my knowledge a append-only source can be used as a source for streaming table just confused how we are able to use a view as a streaming source. could you please explain this
โ03-23-2026 05:35 AM
@Ashwin_DSA could you please help with above ๐
sorry forgot to mention you above & unable to edit it ๐
โ03-23-2026 08:09 AM
Hi @IM_01,
Youโre correct on both points. A view is purely logical, and a streaming table still needs an append-only source. What makes this work is that streaming is applied to the underlying Delta tables, not to the view object itself.
When you do STREAM(my_view) or spark.readStream.table("my_view"), Databricks expands the view definition (CREATE VIEW my_view AS SELECT ... FROM base_table ...) and builds a streaming plan directly over the base Delta table(s) that the view references. In other words, itโs equivalent to copying the viewโs SELECT into your streaming query. The fact that itโs called a view is just naming. The streaming engine still talks to the underlying Delta tables.
The constraint "source must be appendโonly" is checked on the Delta table behind the view, not on the view itself. Structured Streaming only supports streaming from Delta tables if upstream writes are appendโonly. UPDATE/DELETE/MERGE will break the stream unless you use options like skipChangeCommits.
In DLT/SDP, streaming tables can only read from streaming/appendโonly sources (Auto Loader, streaming tables, Delta tables used in appendโonly mode). If the viewโs query wraps an appendโonly Delta table with allowed operators (projection, filters, certain joins, etc.), then the resulting streaming plan is still over an appendโonly source.
So, you're not really streaming from a view in the physical sense. Youโre streaming from the appendโonly Delta table(s) behind that view, with the viewโs SQL simply applied on top of the streaming plan.
Hope that answers your question.
If this answer resolves your question, could you mark it as โAccept as Solutionโ? That helps other users quickly find the correct fix.
โ03-23-2026 12:05 PM
Hi @Ashwin_DSA
Thanks for the response ๐
But in case of materialized view as a source for streaming table it should not work right but I see the pipeline is running without error. could you please help with this as the refresh rate can be full/incrmentral as per row_id tracking right.
Sorry for so many questions ๐
โ03-23-2026 12:37 PM
Hi @IM_01,
Youโre right to be suspicious here... a materialized view is not a safe or supported source for a streaming table, even if your pipeline currently runs without errors.
Streaming tables are built on Structured Streaming and assume the source is appendโonly. New versions of the source table may only add rows. Existing rows are never changed. They only process new data since the last checkpoint and do not recompute history.
Materialized views are maintained by Enzyme with batch semantics. On each refresh (incremental or full) they are free to update or delete existing rows in their backing Delta table to keep the result correct. A full refresh can even truncate and recompute everything.
So if you do:
CREATE OR REFRESH STREAMING TABLE st AS
SELECT * FROM STREAM(my_mv);
The fact that your pipeline runs today usually just means the MV hasnโt yet done anything but pure appends. Itโs not a pattern you should rely on.
Hope this answers your question.
If this answer resolves your question, could you mark it as โAccept as Solutionโ? That helps other users quickly find the correct fix.
โ03-24-2026 02:11 AM - edited โ03-24-2026 02:19 AM
@Ashwin_DSA So if we use view as a streaming source it can only reference a streaming table that are part of sdp pipeline or delta tables created outside the pipeline but not materialized views
And even a temporary view can be used as streaming source right.
โ03-24-2026 03:13 AM
Hi @IM_01,
You can stream from views (including SDP temporary views) as long as those views ultimately sit on appendโonly streaming/Delta tables. Using a materialized view as the base is not supported. MV refresh can rewrite history (updates/deletes), breaking the appendโonly contract that streaming tables rely on, even if your pipeline currently runs without errors.
If this answer resolves your question, could you mark it as โAccept as Solutionโ? That helps other users quickly find the correct fix.
โ03-24-2026 03:25 AM
Thanks Ashwin