<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: DLT apply_as_deletes not working on existing data with full refresh in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/dlt-apply-as-deletes-not-working-on-existing-data-with-full/m-p/107707#M42894</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/146743"&gt;@sdes10&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P&gt;If full refresh is not working as expected, consider using incremental refresh for materialized views. Incremental refresh attempts to process only new or changed data, which might help in correctly applying deletes.&lt;/P&gt;
&lt;P&gt;If you are using streaming tables, you can set the &lt;CODE&gt;skipChangeCommits&lt;/CODE&gt; option to ignore file-changing operations.&lt;/P&gt;</description>
    <pubDate>Thu, 30 Jan 2025 07:25:39 GMT</pubDate>
    <dc:creator>Sidhant07</dc:creator>
    <dc:date>2025-01-30T07:25:39Z</dc:date>
    <item>
      <title>DLT apply_as_deletes not working on existing data with full refresh</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-apply-as-deletes-not-working-on-existing-data-with-full/m-p/107649#M42878</link>
      <description>&lt;P&gt;I have an existing DLT pipeline that works on a modified medallion architecture. Data is sent from debezium to kafka and lands into a bronze table. From bronze table, it goes to a silver table where it is schematized. Finally to a good table where I use apply_changes to handle upserts and deletes. I added apply_as_deletes to an existing pipeline and table and i am seeing a weird issue. The deletes works on new incoming new data but when i do a full refresh of silver and gold table, it does not delete existing data from the table. Instead what i am seeing is that rows exist which have both __UPSERTVERSION and __DELETEVERSION.&amp;nbsp;&lt;/P&gt;&lt;P&gt;The rows exists in silver table like this&lt;/P&gt;&lt;P&gt;id, metadata, operation_ts, op&lt;/P&gt;&lt;P&gt;1, m1, ts1, c&lt;/P&gt;&lt;P&gt;1, m1, ts2, d&lt;/P&gt;&lt;P&gt;my apply_changes looks like this&lt;/P&gt;&lt;DIV&gt;&lt;PRE&gt;dlt.create_streaming_table(&lt;SPAN&gt;name&lt;/SPAN&gt;=dest_scd_table)&lt;BR /&gt;dlt.apply_changes(&lt;BR /&gt;&lt;SPAN&gt;target &lt;/SPAN&gt;= dest_scd_table, &lt;SPAN&gt;#The table being materilized&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;source &lt;/SPAN&gt;= source_table, &lt;SPAN&gt;#the incoming CDC&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;keys &lt;/SPAN&gt;= keys, &lt;SPAN&gt;#what we'll be using to match the rows to upsert&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;sequence_by &lt;/SPAN&gt;= col(seq_col), &lt;SPAN&gt;#we deduplicate by operation date getting the most recent value&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;ignore_null_updates &lt;/SPAN&gt;= &lt;SPAN&gt;False&lt;/SPAN&gt;,&lt;BR /&gt;&lt;SPAN&gt;stored_as_scd_type &lt;/SPAN&gt;= scd_type,&lt;BR /&gt;&lt;SPAN&gt;apply_as_deletes &lt;/SPAN&gt;= expr(&lt;SPAN&gt;"op = 'd'"&lt;/SPAN&gt;), &lt;SPAN&gt;# If row is deleted in postgres, we will delete the row in the _final table. It will still exist in append-only tables.&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;except_column_list &lt;/SPAN&gt;= [col(&lt;SPAN&gt;"op"&lt;/SPAN&gt;)] &lt;SPAN&gt;# List of columns from the silver tables we do not want to appear in the gold tables.&lt;BR /&gt;&lt;/SPAN&gt;)&lt;/PRE&gt;&lt;/DIV&gt;&lt;P&gt;This query gives me the correct count if i filter by it&lt;/P&gt;&lt;P&gt;select count(*) from &amp;lt;table&amp;gt; where __DROP_EXPECTATIONS_COL is not null&lt;/P&gt;</description>
      <pubDate>Wed, 29 Jan 2025 18:46:04 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-apply-as-deletes-not-working-on-existing-data-with-full/m-p/107649#M42878</guid>
      <dc:creator>sdes10</dc:creator>
      <dc:date>2025-01-29T18:46:04Z</dc:date>
    </item>
    <item>
      <title>Re: DLT apply_as_deletes not working on existing data with full refresh</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-apply-as-deletes-not-working-on-existing-data-with-full/m-p/107669#M42883</link>
      <description>&lt;P&gt;for context, i started a new pipeline and the gold table in DLT UI says 38k upserted records, 0 deleted records. The gold table has ~200k records(includes records that should've been deleted).&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 29 Jan 2025 22:22:52 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-apply-as-deletes-not-working-on-existing-data-with-full/m-p/107669#M42883</guid>
      <dc:creator>sdes10</dc:creator>
      <dc:date>2025-01-29T22:22:52Z</dc:date>
    </item>
    <item>
      <title>Re: DLT apply_as_deletes not working on existing data with full refresh</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-apply-as-deletes-not-working-on-existing-data-with-full/m-p/107707#M42894</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/146743"&gt;@sdes10&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P&gt;If full refresh is not working as expected, consider using incremental refresh for materialized views. Incremental refresh attempts to process only new or changed data, which might help in correctly applying deletes.&lt;/P&gt;
&lt;P&gt;If you are using streaming tables, you can set the &lt;CODE&gt;skipChangeCommits&lt;/CODE&gt; option to ignore file-changing operations.&lt;/P&gt;</description>
      <pubDate>Thu, 30 Jan 2025 07:25:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-apply-as-deletes-not-working-on-existing-data-with-full/m-p/107707#M42894</guid>
      <dc:creator>Sidhant07</dc:creator>
      <dc:date>2025-01-30T07:25:39Z</dc:date>
    </item>
    <item>
      <title>Re: DLT apply_as_deletes not working on existing data with full refresh</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-apply-as-deletes-not-working-on-existing-data-with-full/m-p/107899#M42943</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/36707"&gt;@Sidhant07&lt;/a&gt;&amp;nbsp;how do i use&amp;nbsp;&lt;SPAN&gt;skipChangeCommits? The idea is that i have a bronze, silver and gold table already built. Now i am enabling deletes on gold table in the apply_changes API. The silver table is added with operation column (values c,u,r,d). I did a full refresh of silver table to propagate the values in this operation column and then want the gold table to delete those rows with operation='d'. So i did a full refresh of silver and gold. Silver table gets operation value populated but in gold table, the rows with operation='delete' persist. The value for columns __UpsertVersion and __DeleteVersion are the same, which i believe is what is causing them to not get deleted. I dont know how these columns get populated.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 30 Jan 2025 17:53:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-apply-as-deletes-not-working-on-existing-data-with-full/m-p/107899#M42943</guid>
      <dc:creator>sdes10</dc:creator>
      <dc:date>2025-01-30T17:53:45Z</dc:date>
    </item>
  </channel>
</rss>

