<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: DLT table reading not performing file pruning on partition column in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/dlt-table-reading-not-performing-file-pruning-on-partition/m-p/151096#M53576</link>
    <description>&lt;P&gt;Hello &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/133188"&gt;@SteveOstrowski&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you for detailed explanation for why partition pruning don't work in streaming queries. Currently I am using dlt.view instead of dlt.table. Will @dlt.view also works as same as dlt.table? As I want to avoid creation of duplicate bronze table.&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 17 Mar 2026 00:41:41 GMT</pubDate>
    <dc:creator>Anish_2</dc:creator>
    <dc:date>2026-03-17T00:41:41Z</dc:date>
    <item>
      <title>DLT table reading not performing file pruning on partition column</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-table-reading-not-performing-file-pruning-on-partition/m-p/146828#M52709</link>
      <description>&lt;P&gt;I have created bronze table and partitioned on processing date which is date column. In silver table i am putting filter on basis of processing date column to read last 2 days data but it is reading 37 million data but i have only&amp;nbsp;&lt;SPAN&gt;24722 in last 2 days partitions. Below is screenshot of query profile in Delta live table&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="gkapri_0-1770221522007.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/23621i3012050F2038AF72/image-size/medium?v=v2&amp;amp;px=400" role="button" title="gkapri_0-1770221522007.png" alt="gkapri_0-1770221522007.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Below is function to filter last 2 days data&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;def&lt;/SPAN&gt; &lt;SPAN&gt;filter_latest_records&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;df&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;SPAN&gt;timing_column&lt;/SPAN&gt; &lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;'processing_date'&lt;/SPAN&gt;&lt;SPAN&gt;&lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;df&lt;/SPAN&gt; &lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;df&lt;/SPAN&gt;&lt;SPAN&gt;.filter(col(&lt;/SPAN&gt;&lt;SPAN&gt;timing_column&lt;/SPAN&gt;&lt;SPAN&gt;) &lt;/SPAN&gt;&lt;SPAN&gt;&amp;gt;=&lt;/SPAN&gt;&lt;SPAN&gt; date_sub(current_date(), &lt;/SPAN&gt;&lt;SPAN&gt;1&lt;/SPAN&gt;&lt;SPAN&gt;))&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;return&lt;/SPAN&gt; &lt;SPAN&gt;df&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Can some help me why it is reading 37 million data?&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Wed, 04 Feb 2026 16:14:38 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-table-reading-not-performing-file-pruning-on-partition/m-p/146828#M52709</guid>
      <dc:creator>gkapri</dc:creator>
      <dc:date>2026-02-04T16:14:38Z</dc:date>
    </item>
    <item>
      <title>Re: DLT table reading not performing file pruning on partition column</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-table-reading-not-performing-file-pruning-on-partition/m-p/146884#M52718</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&amp;nbsp;becasue you need a literal not dynamic expression, with dynanic expression all partitions are already scanned. so btr to use literal.&amp;nbsp;try this:&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&lt;!--   ScriptorStartFragment   --&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;DIV class=""&gt;&lt;SPAN&gt;def filter_latest_records(df, timing_column="processing_date"):&lt;/SPAN&gt;&lt;DIV class=""&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; yesterday = spark.sql("select date_sub(current_date(), 1) as d").first()["d"]&lt;/SPAN&gt;&lt;DIV class=""&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; return df.filter(col(timing_column) &amp;gt;= yesterday)&lt;!--   ScriptorEndFragment   --&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/214541"&gt;@gkapri&lt;/a&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 05 Feb 2026 11:03:04 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-table-reading-not-performing-file-pruning-on-partition/m-p/146884#M52718</guid>
      <dc:creator>saurabh18cs</dc:creator>
      <dc:date>2026-02-05T11:03:04Z</dc:date>
    </item>
    <item>
      <title>Re: DLT table reading not performing file pruning on partition column</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-table-reading-not-performing-file-pruning-on-partition/m-p/146903#M52724</link>
      <description>&lt;P&gt;&lt;SPAN&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/214541"&gt;@gkapri&lt;/a&gt;&amp;nbsp;Hi, If you use&amp;nbsp;&lt;/SPAN&gt;current_date()&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;directly&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;in the filter, Spark often can’t prune partitions; &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;What you can do here, if you first store&amp;nbsp;&lt;/SPAN&gt;current_date()&lt;SPAN&gt;&amp;nbsp;in&amp;nbsp;any&amp;nbsp; variable&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;(as a value), it usually&amp;nbsp;&lt;/SPAN&gt;&lt;EM&gt;can&lt;/EM&gt;&lt;SPAN&gt;&amp;nbsp;prune.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Also, do try to run EXPLAIN &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;so, when you run&amp;nbsp;EXPLAIN, you should see&amp;nbsp;&lt;STRONG&gt;PartitionFilters&lt;/STRONG&gt;&amp;nbsp;like&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;PartitionFilters: [isnotnull(processing_date#...), (processing_date#... &amp;gt;= 2026-02-05)]&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;Hope this helps!&lt;/P&gt;</description>
      <pubDate>Thu, 05 Feb 2026 13:29:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-table-reading-not-performing-file-pruning-on-partition/m-p/146903#M52724</guid>
      <dc:creator>Amit_Dass_Chmp</dc:creator>
      <dc:date>2026-02-05T13:29:37Z</dc:date>
    </item>
    <item>
      <title>Re: DLT table reading not performing file pruning on partition column</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-table-reading-not-performing-file-pruning-on-partition/m-p/147420#M52754</link>
      <description>&lt;P&gt;I tried by defining variable and then performing filter over it and it worked in databricks notebook as shown in below snippet&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="gkapri_0-1770464980158.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/23780iC8D3641E3413D9EB/image-size/medium?v=v2&amp;amp;px=400" role="button" title="gkapri_0-1770464980158.png" alt="gkapri_0-1770464980158.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;But when i tried same using delta live tables, it is not pruning data as you can see in below snippet&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="gkapri_1-1770465048087.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/23781i64BC35C3BC361F5A/image-size/medium?v=v2&amp;amp;px=400" role="button" title="gkapri_1-1770465048087.png" alt="gkapri_1-1770465048087.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Not sure what is happening&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 07 Feb 2026 11:51:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-table-reading-not-performing-file-pruning-on-partition/m-p/147420#M52754</guid>
      <dc:creator>gkapri</dc:creator>
      <dc:date>2026-02-07T11:51:23Z</dc:date>
    </item>
    <item>
      <title>Re: DLT table reading not performing file pruning on partition column</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-table-reading-not-performing-file-pruning-on-partition/m-p/147456#M52758</link>
      <description>&lt;P&gt;Also one more thing, i am using readstream in pipeline to read bronze table and trying to do partition pruning but it is not happening. Can someone suggest to achieve partition pruning in streaming table&lt;/P&gt;</description>
      <pubDate>Sat, 07 Feb 2026 18:40:12 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-table-reading-not-performing-file-pruning-on-partition/m-p/147456#M52758</guid>
      <dc:creator>gkapri</dc:creator>
      <dc:date>2026-02-07T18:40:12Z</dc:date>
    </item>
    <item>
      <title>Re: DLT table reading not performing file pruning on partition column</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-table-reading-not-performing-file-pruning-on-partition/m-p/147609#M52763</link>
      <description>&lt;P&gt;Explanation&lt;/P&gt;&lt;P&gt;Static partition pruning does not work in streaming reads (readStream) by design.&lt;/P&gt;&lt;P&gt;In streaming, Spark:&lt;/P&gt;&lt;P&gt;Can prune files/partitions to some extent&lt;/P&gt;&lt;P&gt;Cannot prune rows inside files&lt;/P&gt;&lt;P&gt;Must scan entire files of selected partitions&lt;/P&gt;&lt;P&gt;Filters like processing_date &amp;gt;= current_date() - n are applied after data is read, not at scan time.&lt;/P&gt;&lt;P&gt;In DLT, current_date() is non-deterministic, which further limits pruning.&lt;/P&gt;&lt;P&gt;What query profile confirms&lt;/P&gt;&lt;P&gt;Partitions read &amp;lt; total partitions → file pruning happened&lt;/P&gt;&lt;P&gt;Rows read still very high → full file scan&lt;/P&gt;&lt;P&gt;Rows returned = 0 → filter applied post-scan&lt;/P&gt;&lt;P&gt;Key takeaway&lt;/P&gt;&lt;P&gt;Partition pruning in Delta streaming works only at file selection level, not row level.&lt;/P&gt;&lt;P&gt;Large historical files inside partitions will still be fully scanned.&lt;/P&gt;&lt;P&gt;Recommended approaches&lt;/P&gt;&lt;P&gt;Use batch read for Bronze → Silver if partition pruning is required&lt;/P&gt;&lt;P&gt;Use Auto Loader to read only new files&lt;/P&gt;&lt;P&gt;Avoid using readStream when selective historical reads are needed&lt;/P&gt;</description>
      <pubDate>Mon, 09 Feb 2026 05:52:33 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-table-reading-not-performing-file-pruning-on-partition/m-p/147609#M52763</guid>
      <dc:creator>Poorva21</dc:creator>
      <dc:date>2026-02-09T05:52:33Z</dc:date>
    </item>
    <item>
      <title>Re: DLT table reading not performing file pruning on partition column</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-table-reading-not-performing-file-pruning-on-partition/m-p/147623#M52766</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/214541"&gt;@gkapri&lt;/a&gt;&amp;nbsp;&amp;nbsp;can you do this for DLT pipelines? for DLT we need a ccomplile time SQL literal not runtime literal &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;SPAN&gt;&lt;!--   ScriptorStartFragment   --&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;DIV class=""&gt;&lt;SPAN&gt;def filter_latest_records(df, timing_column="processing_date"):&lt;/SPAN&gt;&lt;DIV class=""&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; return df.filter(&lt;/SPAN&gt;&lt;DIV class=""&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; col(timing_column) &amp;gt;= expr("date_sub(current_date(), 1)")&lt;/SPAN&gt;&lt;DIV class=""&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; )&lt;!--   ScriptorEndFragment   --&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/214541"&gt;@gkapri&lt;/a&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 09 Feb 2026 08:54:54 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-table-reading-not-performing-file-pruning-on-partition/m-p/147623#M52766</guid>
      <dc:creator>saurabh18cs</dc:creator>
      <dc:date>2026-02-09T08:54:54Z</dc:date>
    </item>
    <item>
      <title>Re: DLT table reading not performing file pruning on partition column</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-table-reading-not-performing-file-pruning-on-partition/m-p/148507#M52903</link>
      <description>&lt;P&gt;I am using autocdc in silver and autocdc in silver requires streaming source. I tried using batch read but it failed. If you have code handy of batch read of bronze and do autocdc in silver,it would be very helpful &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/199898"&gt;@Poorva21&lt;/a&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 16 Feb 2026 11:37:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-table-reading-not-performing-file-pruning-on-partition/m-p/148507#M52903</guid>
      <dc:creator>Anish_2</dc:creator>
      <dc:date>2026-02-16T11:37:58Z</dc:date>
    </item>
    <item>
      <title>Re: DLT table reading not performing file pruning on partition column</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-table-reading-not-performing-file-pruning-on-partition/m-p/148510#M52904</link>
      <description>&lt;P&gt;Basically here ask is to read bronze table new files only using autoloader then do some basic transformation like renaming of columns then do autocdc in silver. But when we do readsteam on bronze, it scan whole table instead of partition pruning. Can u pls give idea how can we achieve this ?&lt;/P&gt;</description>
      <pubDate>Mon, 16 Feb 2026 11:53:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-table-reading-not-performing-file-pruning-on-partition/m-p/148510#M52904</guid>
      <dc:creator>Anish_2</dc:creator>
      <dc:date>2026-02-16T11:53:20Z</dc:date>
    </item>
    <item>
      <title>Re: DLT table reading not performing file pruning on partition column</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-table-reading-not-performing-file-pruning-on-partition/m-p/150106#M53243</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/214541"&gt;@gkapri&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;Thanks for the detailed writeup. I can see from the thread that you have already tried the literal variable approach that works in a notebook but does not work inside a Lakeflow Spark Declarative Pipeline (SDP). Let me explain what is happening and how to address it.&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;WHY PARTITION PRUNING DOES NOT WORK WITH readStream IN SDP&lt;/P&gt;
&lt;P&gt;The root cause is how Structured Streaming reads Delta tables versus how batch reads work. These are fundamentally different processing models:&lt;/P&gt;
&lt;P&gt;1. Batch reads (spark.read) evaluate the full query plan at execution time, including partition filters. When you filter on a partition column with a literal value, Spark can push that filter down into the scan and skip entire partition directories. This is why your notebook test worked.&lt;/P&gt;
&lt;P&gt;2. Streaming reads (spark.readStream) process the Delta transaction log incrementally. The stream tracks which files have already been processed using a checkpoint, and on each micro-batch it picks up NEW files from the log. The key point is that readStream does not apply your downstream filters at the file-selection stage. Instead, it reads all new files from the transaction log first, and then your filter is applied after the data is already read into memory. This is by design, because the streaming engine needs to maintain exactly-once processing guarantees by tracking every file.&lt;/P&gt;
&lt;P&gt;3. Non-deterministic functions in SDP add another layer. Inside a Lakeflow Spark Declarative Pipeline (SDP), expressions like current_date() are evaluated at pipeline plan time. Even converting to a literal variable with spark.sql() may not resolve the issue because the SDP execution model handles variable evaluation differently than an interactive notebook.&lt;/P&gt;
&lt;P&gt;So to summarize: the 37 million rows you see being read is the stream reading all new files from the bronze table's transaction log, and then your processing_date filter is applied after the fact, resulting in only 24,722 rows being kept.&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;WHAT ABOUT AUTO CDC (APPLY CHANGES)?&lt;/P&gt;
&lt;P&gt;I noticed from the follow-up comments that you and &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/124640"&gt;@Anish_2&lt;/a&gt; are using AUTO CDC (formerly APPLY CHANGES) for the silver layer, which requires a streaming source. This means you cannot simply switch to a batch read (spark.read) to get partition pruning. This is a real constraint.&lt;/P&gt;
&lt;P&gt;Here are your options:&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;OPTION 1: LET THE STREAM HANDLE INCREMENTALITY (RECOMMENDED)&lt;/P&gt;
&lt;P&gt;Instead of filtering by processing_date to limit the data, rely on the streaming checkpoint to handle incrementality for you. This is actually the intended design pattern for streaming pipelines:&lt;/P&gt;
&lt;P&gt;import dlt&lt;BR /&gt;from pyspark.sql.functions import col&lt;/P&gt;
&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/97035"&gt;@Dlt&lt;/a&gt;.table()&lt;BR /&gt;def silver_table():&lt;BR /&gt;return spark.readStream.table("LIVE.bronze_table")&lt;/P&gt;
&lt;P&gt;The stream will only process NEW files that arrived since the last pipeline run. After the initial backfill (which will read all historical data once), subsequent runs will only pick up incremental data. This eliminates the need for the date filter entirely because the checkpoint ensures you never reprocess old data.&lt;/P&gt;
&lt;P&gt;If your concern is the initial backfill reading too much data, you can control the rate with:&lt;/P&gt;
&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/97035"&gt;@Dlt&lt;/a&gt;.table()&lt;BR /&gt;def silver_table():&lt;BR /&gt;return (&lt;BR /&gt;spark.readStream&lt;BR /&gt;.option("maxFilesPerTrigger", 100)&lt;BR /&gt;.table("LIVE.bronze_table")&lt;BR /&gt;)&lt;/P&gt;
&lt;P&gt;This throttles how many files are processed per micro-batch during the initial catch-up.&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;OPTION 2: USE skipChangeCommits WITH A TARGETED STREAM&lt;/P&gt;
&lt;P&gt;If your bronze table receives updates or deletes and you only want appends, you can combine streaming with skipChangeCommits:&lt;/P&gt;
&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/97035"&gt;@Dlt&lt;/a&gt;.table()&lt;BR /&gt;def silver_table():&lt;BR /&gt;return (&lt;BR /&gt;spark.readStream&lt;BR /&gt;.option("skipChangeCommits", "true")&lt;BR /&gt;.table("LIVE.bronze_table")&lt;BR /&gt;)&lt;/P&gt;
&lt;P&gt;This skips any commits that contain updates or deletes, processing only append commits.&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;OPTION 3: USE A MATERIALIZED VIEW FOR THE FILTERED LAYER&lt;/P&gt;
&lt;P&gt;If you truly need partition pruning with a date filter, use a materialized view for that transformation step, since materialized views use batch reads:&lt;/P&gt;
&lt;P&gt;CREATE OR REFRESH MATERIALIZED VIEW silver_filtered&lt;BR /&gt;AS&lt;BR /&gt;SELECT *&lt;BR /&gt;FROM LIVE.bronze_table&lt;BR /&gt;WHERE processing_date &amp;gt;= current_date() - INTERVAL 2 DAYS&lt;/P&gt;
&lt;P&gt;Materialized views process data using batch reads, so partition pruning works as expected. The trade-off is that materialized views fully recompute on each refresh (though Databricks does apply incremental refresh optimizations where possible).&lt;/P&gt;
&lt;P&gt;However, if you need to feed this into an AUTO CDC target, you would need an additional streaming table layer reading from this materialized view.&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;OPTION 4: USE startingTimestamp TO LIMIT INITIAL LOAD&lt;/P&gt;
&lt;P&gt;If your main concern is the initial pipeline run reading too much historical data, you can set a starting point for the stream:&lt;/P&gt;
&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/97035"&gt;@Dlt&lt;/a&gt;.table()&lt;BR /&gt;def silver_table():&lt;BR /&gt;return (&lt;BR /&gt;spark.readStream&lt;BR /&gt;.option("startingTimestamp", "2026-02-01")&lt;BR /&gt;.table("LIVE.bronze_table")&lt;BR /&gt;)&lt;/P&gt;
&lt;P&gt;This tells the stream to only process changes from that timestamp forward, skipping all earlier data. After the initial run, the checkpoint takes over and only new data is processed.&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;ADDITIONAL PERFORMANCE TIPS&lt;/P&gt;
&lt;P&gt;1. Enable auto-compaction on your bronze table. If your bronze streaming table creates many small files, queries (and downstream streams) will be slower. Lakeflow Spark Declarative Pipelines (SDP) enables auto-optimization by default, but verify it is working by checking the file count:&lt;/P&gt;
&lt;P&gt;DESCRIBE DETAIL your_catalog.your_schema.bronze_table&lt;/P&gt;
&lt;P&gt;2. Consider liquid clustering instead of partitioning. For new tables, liquid clustering (CLUSTER BY) often outperforms traditional partitioning because it handles data skipping more flexibly:&lt;/P&gt;
&lt;P&gt;CREATE OR REFRESH STREAMING TABLE bronze_table&lt;BR /&gt;CLUSTER BY (processing_date)&lt;BR /&gt;AS SELECT ...&lt;/P&gt;
&lt;P&gt;3. Run OPTIMIZE periodically on the bronze table if you see many small files accumulating, or enable predictive optimization at the catalog or schema level.&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;KEY TAKEAWAY&lt;/P&gt;
&lt;P&gt;The fundamental insight is that streaming reads and batch reads use different mechanisms for data access. Streaming reads track the transaction log incrementally and do not apply partition pruning on downstream filters. The recommended pattern for streaming pipelines is to rely on checkpointing for incrementality rather than date-based filters. Your date filter still works for correctness (it correctly filters the output), but it cannot reduce the amount of data read at the source scan level in a streaming context.&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;RELEVANT DOCUMENTATION&lt;/P&gt;
&lt;P&gt;Delta Lake as a streaming source:&lt;BR /&gt;&lt;A href="https://docs.databricks.com/en/structured-streaming/delta-lake.html" target="_blank"&gt;https://docs.databricks.com/en/structured-streaming/delta-lake.html&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Lakeflow Spark Declarative Pipelines (SDP) streaming tables:&lt;BR /&gt;&lt;A href="https://docs.databricks.com/en/ldp/concepts.html" target="_blank"&gt;https://docs.databricks.com/en/ldp/concepts.html&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;AUTO CDC (APPLY CHANGES) reference:&lt;BR /&gt;&lt;A href="https://docs.databricks.com/en/ldp/cdc.html" target="_blank"&gt;https://docs.databricks.com/en/ldp/cdc.html&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Data skipping and partition pruning:&lt;BR /&gt;&lt;A href="https://docs.databricks.com/en/delta/data-skipping.html" target="_blank"&gt;https://docs.databricks.com/en/delta/data-skipping.html&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Liquid clustering:&lt;BR /&gt;&lt;A href="https://docs.databricks.com/en/delta/clustering.html" target="_blank"&gt;https://docs.databricks.com/en/delta/clustering.html&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;File size tuning and auto-compaction:&lt;BR /&gt;&lt;A href="https://docs.databricks.com/en/delta/tune-file-size.html" target="_blank"&gt;https://docs.databricks.com/en/delta/tune-file-size.html&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;Hope this helps you and &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/124640"&gt;@Anish_2&lt;/a&gt;! The short answer is that this is expected behavior for streaming reads, and the best approach is to let the streaming checkpoint handle incrementality rather than using date filters to limit the scan.&lt;/P&gt;
&lt;P&gt;* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.&lt;/P&gt;</description>
      <pubDate>Sun, 08 Mar 2026 02:53:17 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-table-reading-not-performing-file-pruning-on-partition/m-p/150106#M53243</guid>
      <dc:creator>SteveOstrowski</dc:creator>
      <dc:date>2026-03-08T02:53:17Z</dc:date>
    </item>
    <item>
      <title>Re: DLT table reading not performing file pruning on partition column</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-table-reading-not-performing-file-pruning-on-partition/m-p/151096#M53576</link>
      <description>&lt;P&gt;Hello &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/133188"&gt;@SteveOstrowski&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you for detailed explanation for why partition pruning don't work in streaming queries. Currently I am using dlt.view instead of dlt.table. Will @dlt.view also works as same as dlt.table? As I want to avoid creation of duplicate bronze table.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 17 Mar 2026 00:41:41 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-table-reading-not-performing-file-pruning-on-partition/m-p/151096#M53576</guid>
      <dc:creator>Anish_2</dc:creator>
      <dc:date>2026-03-17T00:41:41Z</dc:date>
    </item>
    <item>
      <title>Re: DLT table reading not performing file pruning on partition column</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-table-reading-not-performing-file-pruning-on-partition/m-p/151097#M53577</link>
      <description>&lt;P&gt;Hi Anish,&lt;/P&gt;
&lt;P&gt;Yes,&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;@ dlt.view&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;(or&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;@ dp.temporary_view&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;in the newer Lakeflow Declarative Pipelines syntax) will work for your use case. The key difference is that a view does not materialize/persist data as a physical Delta table -- it acts as a temporary, virtual definition within the pipeline. So if your goal is to avoid creating a duplicate bronze table, using&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;@ dlt.view&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;instead of&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;@ dlt.table&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;is the right approach. The view defines the transformation logic without writing a separate copy of the data.&lt;/P&gt;
&lt;P&gt;One thing to keep in mind: the partition pruning limitation I described earlier applies at the streaming&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;EM&gt;read&lt;/EM&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;level, not at the materialization level. Streaming reads track the Delta transaction log incrementally and do not push down partition filters. That behavior is the same regardless of whether the downstream dataset is defined as a&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;@ dlt.view&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;or&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;@ dlt.table&lt;/CODE&gt;. So the view avoids the duplicate table, but the streaming source will still scan all partitions as new files arrive.&lt;/P&gt;
&lt;P&gt;If you need partition-level filtering on the source, you would need a batch read (non-streaming) with explicit filter pushdown, but that changes the incremental processing semantics. For most bronze ingestion patterns, the streaming approach with a view is the right tradeoff.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Sources:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;A href="https://docs.databricks.com/aws/delta-live-tables/python-ref" target="_blank" rel="noopener" data-saferedirecturl="https://www.google.com/url?q=https://docs.databricks.com/aws/delta-live-tables/python-ref&amp;amp;source=gmail&amp;amp;ust=1773794897950000&amp;amp;usg=AOvVaw0TTyQdUcfx-WHgVm6SVYOg"&gt;DLT Python language reference&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://community.databricks.com/t5/data-engineering/dlt-table-reading-not-performing-file-pruning-on-partition/m-p/151096#M53576" target="_blank" rel="noopener" data-saferedirecturl="https://www.google.com/url?q=https://community.databricks.com/t5/data-engineering/dlt-table-reading-not-performing-file-pruning-on-partition/m-p/151096%23M53576&amp;amp;source=gmail&amp;amp;ust=1773794897950000&amp;amp;usg=AOvVaw2Txuls8WvRTRIp4ZFHWqnS"&gt;Original community thread&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;</description>
      <pubDate>Tue, 17 Mar 2026 00:52:55 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-table-reading-not-performing-file-pruning-on-partition/m-p/151097#M53577</guid>
      <dc:creator>SteveOstrowski</dc:creator>
      <dc:date>2026-03-17T00:52:55Z</dc:date>
    </item>
    <item>
      <title>Re: DLT table reading not performing file pruning on partition column</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-table-reading-not-performing-file-pruning-on-partition/m-p/151098#M53578</link>
      <description>&lt;P&gt;Hello &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/133188"&gt;@SteveOstrowski&lt;/a&gt;&amp;nbsp; currently there are 2 pipelines running in parallel. Both pipeline consists of bronze,view and silver table. 1st pipeline is having around 40+ tables and 2nd has 20. I am not able to debug why pipeline with less table is taking more time in initialising and loading of data. Could you please steps to debug why it is taking time&lt;/P&gt;</description>
      <pubDate>Tue, 17 Mar 2026 00:54:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-table-reading-not-performing-file-pruning-on-partition/m-p/151098#M53578</guid>
      <dc:creator>Anish_2</dc:creator>
      <dc:date>2026-03-17T00:54:48Z</dc:date>
    </item>
    <item>
      <title>Re: DLT table reading not performing file pruning on partition column</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-table-reading-not-performing-file-pruning-on-partition/m-p/151100#M53580</link>
      <description>&lt;P&gt;Hi Anish,&lt;/P&gt;
&lt;P&gt;There are several things that could explain why a pipeline with fewer tables takes longer to initialize and load. The number of tables alone does not determine pipeline performance — it depends on the complexity of each flow, data volumes, and how the pipeline is configured. Here are some concrete steps to debug this:&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;1. Check the event log for timing breakdowns&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Query the pipeline event log for&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;flow_progress&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;events to see per-table&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;duration_seconds&lt;/CODE&gt;. This will tell you exactly which tables in the slower pipeline are taking the most time during initialization and data loading. Also look at&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;update_progress&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;events to see how long the&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;INITIALIZING&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;and&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;SETTING_UP_TABLES&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;phases take.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;2. Look at the INITIALIZING and SETTING_UP_TABLES phases&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;During&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;INITIALIZING&lt;/CODE&gt;, the pipeline builds its logical plan and dependency graph — complex dependencies or CDC transformations add overhead here. During&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;SETTING_UP_TABLES&lt;/CODE&gt;, schema validation and table state assessment happen for every defined table. If the 20-table pipeline has more complex transformations (e.g.,&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;apply_changes&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;/ SCD operations, many views, or heavier joins), it could take longer than the 40-table pipeline with simpler flows.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;3. Check for driver bottlenecks&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Monitor driver CPU and memory. Even with fewer tables, if the individual streaming flows are more resource-intensive (larger state, more complex logic), the driver can become a bottleneck.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;4. Compare data volumes and file counts&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Use the&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;operation_progress&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;events in the event log to check Auto Loader file listing counts and backlog bytes. The slower pipeline may be processing significantly more data per table, or dealing with many small files which degrade read performance.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;5. Review pipeline mode&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;If both pipelines are in triggered mode, they perform initialization steps on every trigger. Continuous pipelines only initialize on restart. If only the slower one is triggered, that could explain the difference.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Sources:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;A href="https://docs.databricks.com/aws/en/ldp/fix-high-init" data-saferedirecturl="https://www.google.com/url?q=https://docs.databricks.com/aws/en/ldp/fix-high-init&amp;amp;source=gmail&amp;amp;ust=1773795995462000&amp;amp;usg=AOvVaw0Ss2hMGE8VYDzCvXMyZpu0" target="_blank"&gt;Fixing high initialization times in pipelines&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://docs.databricks.com/aws/en/ldp/monitor-event-logs" data-saferedirecturl="https://www.google.com/url?q=https://docs.databricks.com/aws/en/ldp/monitor-event-logs&amp;amp;source=gmail&amp;amp;ust=1773795995462000&amp;amp;usg=AOvVaw0I9V_aZW2i2ukE3f8VoRh2" target="_blank"&gt;Monitor pipeline event logs&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://docs.databricks.com/aws/en/ldp/best-practices/" data-saferedirecturl="https://www.google.com/url?q=https://docs.databricks.com/aws/en/ldp/best-practices/&amp;amp;source=gmail&amp;amp;ust=1773795995462000&amp;amp;usg=AOvVaw3PBRUEOz1m8cuz9QheMtSi" target="_blank"&gt;Pipeline best practices&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;</description>
      <pubDate>Tue, 17 Mar 2026 01:07:54 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-table-reading-not-performing-file-pruning-on-partition/m-p/151100#M53580</guid>
      <dc:creator>SteveOstrowski</dc:creator>
      <dc:date>2026-03-17T01:07:54Z</dc:date>
    </item>
    <item>
      <title>Re: DLT table reading not performing file pruning on partition column</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-table-reading-not-performing-file-pruning-on-partition/m-p/151561#M53663</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/133188"&gt;@SteveOstrowski&lt;/a&gt;&amp;nbsp; i have encountered one more issue with autocdc. Below is snipped of my autocdc where there are 2 source tables for vlc_hist. After completion, i validated data but data is missing from source 1 is still missiing from vlc_hist silver table. Could you please let me know where i went wrong?&lt;/P&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Screenshot 2026-03-21 014734.png" style="width: 999px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/25091iE102B1134B683DD0/image-size/large?v=v2&amp;amp;px=999" role="button" title="Screenshot 2026-03-21 014734.png" alt="Screenshot 2026-03-21 014734.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 20 Mar 2026 20:20:07 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-table-reading-not-performing-file-pruning-on-partition/m-p/151561#M53663</guid>
      <dc:creator>Anish_2</dc:creator>
      <dc:date>2026-03-20T20:20:07Z</dc:date>
    </item>
    <item>
      <title>Re: DLT table reading not performing file pruning on partition column</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-table-reading-not-performing-file-pruning-on-partition/m-p/151576#M53664</link>
      <description>&lt;P data-pm-slice="1 1 []"&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/124640"&gt;@Anish_2&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;Looking at your pipeline DAG, the issue is that you have two separate APPLY CHANGES INTO flows both targeting the same silver table (ag_vlc_hist), one from ag_swt_vlchistory_historical and one from ag_swt_vlchistory. When you define multiple APPLY CHANGES INTO statements against the same target without giving each flow a unique name, they both default to using the target table name as the flow name. This means they share the same internal checkpoint, and the second flow's results effectively replace the first. That is why data from source 1 is missing.&lt;/P&gt;
&lt;P&gt;THE FIX: USE UNIQUE FLOW NAMES (RECOMMENDED)&lt;/P&gt;
&lt;P&gt;Lakeflow Spark Declarative Pipelines (SDP) supports multiple APPLY CHANGES flows targeting a single streaming table, but each flow must have a unique name. You do this with the flow_name parameter in Python or the CREATE FLOW syntax in SQL. Each named flow gets its own checkpoint, so both sources process independently and merge correctly into the target.&lt;/P&gt;
&lt;P&gt;Python example:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;import dlt
from pyspark.sql.functions import col

dlt.create_streaming_table("ag_vlc_hist")

dlt.apply_changes(
    flow_name="historical_flow",
    target="ag_vlc_hist",
    source="ag_swt_vlchistory_historical",
    keys=["your_primary_key_column"],
    sequence_by=col("your_sequence_column"),
    stored_as_scd_type=1
)

dlt.apply_changes(
    flow_name="raw_flow",
    target="ag_vlc_hist",
    source="ag_swt_vlchistory",
    keys=["your_primary_key_column"],
    sequence_by=col("your_sequence_column"),
    stored_as_scd_type=1
)&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;SQL example:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;CREATE OR REFRESH STREAMING TABLE ag_vlc_hist;

CREATE FLOW historical_flow
  AS APPLY CHANGES INTO LIVE.ag_vlc_hist
  FROM STREAM(LIVE.ag_swt_vlchistory_historical)
  KEYS (your_primary_key_column)
  SEQUENCE BY your_sequence_column
  STORED AS SCD TYPE 1;

CREATE FLOW raw_flow
  AS APPLY CHANGES INTO LIVE.ag_vlc_hist
  FROM STREAM(LIVE.ag_swt_vlchistory)
  KEYS (your_primary_key_column)
  SEQUENCE BY your_sequence_column
  STORED AS SCD TYPE 1;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;This is the preferred approach because each source gets its own checkpoint and can progress independently. If one source fails or is slow, the other is unaffected.&lt;/P&gt;
&lt;P&gt;ALTERNATIVE: UNION SOURCES FIRST&lt;/P&gt;
&lt;P&gt;If you prefer a simpler setup with a single flow, you can UNION both sources into a combined view and use one APPLY CHANGES INTO:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;import dlt
from pyspark.sql.functions import col

@dlt.view()
def ag_swt_vlchistory_combined():
    historical = spark.readStream.table("LIVE.ag_swt_vlchistory_historical")
    raw = spark.readStream.table("LIVE.ag_swt_vlchistory")
    return historical.unionByName(raw, allowMissingColumns=True)

dlt.create_streaming_table("ag_vlc_hist")

dlt.apply_changes(
    target="ag_vlc_hist",
    source="ag_swt_vlchistory_combined",
    keys=["your_primary_key_column"],
    sequence_by=col("your_sequence_column"),
    stored_as_scd_type=1
)&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;This works well but couples both sources into a single checkpoint, so a failure in either source blocks the other.&lt;/P&gt;
&lt;P&gt;IMPORTANT REQUIREMENTS&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P&gt;Same keys across flows: All APPLY CHANGES flows writing to the same target must use the same keys. The sequencing must also be consistent so the engine can correctly determine which record is latest when the same key appears in both sources.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Schema alignment: Make sure both sources have compatible column names and types. If they differ slightly, normalize them in the source views before they reach APPLY CHANGES.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;No mixing flow types: A table targeted by APPLY CHANGES flows can only have APPLY CHANGES flows. You cannot mix APPLY CHANGES and regular append flows on the same target.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Pipeline channel: The multiple named flows feature may require the Preview channel depending on your Databricks runtime version. If you get an error, try setting your pipeline to the Preview channel in the pipeline settings.&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;DOCUMENTATION&lt;/P&gt;
&lt;P&gt;Flows in Lakeflow Spark Declarative Pipelines (SDP): &lt;A href="https://docs.databricks.com/en/dlt/flows.html" target="_blank"&gt;https://docs.databricks.com/en/dlt/flows.html&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;APPLY CHANGES APIs: &lt;A href="https://docs.databricks.com/en/dlt/cdc.html" target="_blank"&gt;https://docs.databricks.com/en/dlt/cdc.html&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Python API reference (flow_name parameter): &lt;A href="https://docs.databricks.com/en/dlt-ref/dlt-python-ref-apply-changes.html" target="_blank"&gt;https://docs.databricks.com/en/dlt-ref/dlt-python-ref-apply-changes.html&lt;/A&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;P&gt;This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;If this answer resolves your question, could you mark it as "Accept as Solution"? That helps other users quickly find the correct fix.&lt;/P&gt;</description>
      <pubDate>Sat, 21 Mar 2026 05:48:09 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-table-reading-not-performing-file-pruning-on-partition/m-p/151576#M53664</guid>
      <dc:creator>SteveOstrowski</dc:creator>
      <dc:date>2026-03-21T05:48:09Z</dc:date>
    </item>
  </channel>
</rss>

