<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Deleting Records from DLT Bronze and Silver Tables in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/deleting-records-from-dlt-bronze-and-silver-tables/m-p/92508#M38456</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/119378"&gt;@Gianfranco&lt;/a&gt;,&lt;/P&gt;&lt;P&gt;How are you doing it today?&lt;/P&gt;&lt;P&gt;As per my understanding,&amp;nbsp;Consider &lt;STRONG&gt;using a selective delete&lt;/STRONG&gt; approach on both the Bronze and Silver tables to avoid a full refresh. Instead of deleting data and refreshing the entire Silver table, you could &lt;STRONG&gt;delete the specific records in both tables&lt;/STRONG&gt; by writing an appropriate delete query directly against each Delta table. This way, you're only removing the required records without needing to rebuild the Silver table from scratch. Another option might be to &lt;STRONG&gt;implement Change Data Capture (CDC)&lt;/STRONG&gt;, so you can track and update only the affected records, reducing the need for full table operations. Additionally, explore using &lt;STRONG&gt;vacuum&lt;/STRONG&gt; or &lt;STRONG&gt;optimize&lt;/STRONG&gt; commands to maintain table performance after deletions.&lt;/P&gt;&lt;P&gt;Give a try and see if it works.&lt;/P&gt;&lt;P&gt;Good day.&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Brahma&lt;/P&gt;</description>
    <pubDate>Wed, 02 Oct 2024 03:58:12 GMT</pubDate>
    <dc:creator>Brahmareddy</dc:creator>
    <dc:date>2024-10-02T03:58:12Z</dc:date>
    <item>
      <title>Deleting Records from DLT Bronze and Silver Tables</title>
      <link>https://community.databricks.com/t5/data-engineering/deleting-records-from-dlt-bronze-and-silver-tables/m-p/91741#M38253</link>
      <description>&lt;P class=""&gt;I have a pipeline that generates two DLT streaming tables: a Bronze table and a Silver table. I need to delete specific records from both tables. I've read an article (&lt;SPAN&gt;&lt;SPAN class=""&gt;&lt;A class="" title="https://www.databricks.com/blog/handling-right-be-forgotten-gdpr-and-ccpa-using-delta-live-tables-dlt" href="https://www.databricks.com/blog/handling-right-be-forgotten-gdpr-and-ccpa-using-delta-live-tables-dlt" target="_blank" rel="noreferrer noopener"&gt;https://www.databricks.com/blog/handling-right-be-forgotten-gdpr-and-ccpa-using-delta-live-tables-dlt&lt;/A&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;) suggesting that with two streaming tables, the only option is to delete the data in the Bronze table and then perform a full refresh of the Silver table.&lt;/P&gt;&lt;P class=""&gt;However, the Silver table is very large, and I'd like to avoid a full refresh. Are there any alternative solutions available?&lt;/P&gt;&lt;P class=""&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Wed, 25 Sep 2024 15:01:07 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/deleting-records-from-dlt-bronze-and-silver-tables/m-p/91741#M38253</guid>
      <dc:creator>Gianfranco</dc:creator>
      <dc:date>2024-09-25T15:01:07Z</dc:date>
    </item>
    <item>
      <title>Re: Deleting Records from DLT Bronze and Silver Tables</title>
      <link>https://community.databricks.com/t5/data-engineering/deleting-records-from-dlt-bronze-and-silver-tables/m-p/92508#M38456</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/119378"&gt;@Gianfranco&lt;/a&gt;,&lt;/P&gt;&lt;P&gt;How are you doing it today?&lt;/P&gt;&lt;P&gt;As per my understanding,&amp;nbsp;Consider &lt;STRONG&gt;using a selective delete&lt;/STRONG&gt; approach on both the Bronze and Silver tables to avoid a full refresh. Instead of deleting data and refreshing the entire Silver table, you could &lt;STRONG&gt;delete the specific records in both tables&lt;/STRONG&gt; by writing an appropriate delete query directly against each Delta table. This way, you're only removing the required records without needing to rebuild the Silver table from scratch. Another option might be to &lt;STRONG&gt;implement Change Data Capture (CDC)&lt;/STRONG&gt;, so you can track and update only the affected records, reducing the need for full table operations. Additionally, explore using &lt;STRONG&gt;vacuum&lt;/STRONG&gt; or &lt;STRONG&gt;optimize&lt;/STRONG&gt; commands to maintain table performance after deletions.&lt;/P&gt;&lt;P&gt;Give a try and see if it works.&lt;/P&gt;&lt;P&gt;Good day.&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Brahma&lt;/P&gt;</description>
      <pubDate>Wed, 02 Oct 2024 03:58:12 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/deleting-records-from-dlt-bronze-and-silver-tables/m-p/92508#M38456</guid>
      <dc:creator>Brahmareddy</dc:creator>
      <dc:date>2024-10-02T03:58:12Z</dc:date>
    </item>
    <item>
      <title>Re: Deleting Records from DLT Bronze and Silver Tables</title>
      <link>https://community.databricks.com/t5/data-engineering/deleting-records-from-dlt-bronze-and-silver-tables/m-p/92530#M38458</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/119378"&gt;@Gianfranco&lt;/a&gt;&amp;nbsp;,&lt;BR /&gt;&lt;BR /&gt;Try APPLY CHANGES. This will work well in your scenario, as it supports DELETE operation as well:&lt;BR /&gt;&lt;A href="https://docs.databricks.com/en/delta-live-tables/cdc.html" target="_blank"&gt;https://docs.databricks.com/en/delta-live-tables/cdc.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 02 Oct 2024 06:40:04 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/deleting-records-from-dlt-bronze-and-silver-tables/m-p/92530#M38458</guid>
      <dc:creator>filipniziol</dc:creator>
      <dc:date>2024-10-02T06:40:04Z</dc:date>
    </item>
    <item>
      <title>Re: Deleting Records from DLT Bronze and Silver Tables</title>
      <link>https://community.databricks.com/t5/data-engineering/deleting-records-from-dlt-bronze-and-silver-tables/m-p/103376#M41423</link>
      <description>&lt;P&gt;Remove records using the DELETE operation in both Bronze &amp;amp; Silver tables.&lt;/P&gt;&lt;P&gt;After doing each delete step, you can Optimize the table which rewrites the parquet files for that table behind the scenes to improve the data layout (Read more about optimize here: &lt;A href="https://learn.microsoft.com/en-us/azure/databricks/sql/language-manual/delta-optimize" target="_blank"&gt;https://learn.microsoft.com/en-us/azure/databricks/sql/language-manual/delta-optimize&lt;/A&gt;). It is a simple command: &lt;STRONG&gt;&lt;EM&gt;OPTIMIZE &amp;lt;Table Name&amp;gt;&lt;/EM&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Also, as you move on in this process, you can run the VACUUM command to clean up old versions of the data and free up storage space: &lt;A href="https://learn.microsoft.com/en-us/azure/databricks/sql/language-manual/delta-vacuum" target="_blank"&gt;https://learn.microsoft.com/en-us/azure/databricks/sql/language-manual/delta-vacuum&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Hope this helps.&lt;/P&gt;</description>
      <pubDate>Sat, 28 Dec 2024 04:29:44 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/deleting-records-from-dlt-bronze-and-silver-tables/m-p/103376#M41423</guid>
      <dc:creator>karthickrs</dc:creator>
      <dc:date>2024-12-28T04:29:44Z</dc:date>
    </item>
  </channel>
</rss>

