<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Need help with DLT Pipeline in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/need-help-with-dlt-pipeline/m-p/95652#M39140</link>
    <description>&lt;P&gt;I have a DLT pipeline running daily for months and recently found out one issue in my silver layer code and as a result of that, now I have faulty data in my silver schema. Please note that the tables in Silver schema are streaming tables handled within the context of DLT.&amp;nbsp;&lt;BR /&gt;I want to delete the faulty records from my silver schema and keep the pipeline running with the correct code as it's running normally. Could someone please suggest me what would be the best possible approach for me to update my silver schema streaming tables by deleting faulty records? Early response would be highly appreciated.&lt;/P&gt;</description>
    <pubDate>Wed, 23 Oct 2024 07:24:34 GMT</pubDate>
    <dc:creator>Fatimah-Tariq</dc:creator>
    <dc:date>2024-10-23T07:24:34Z</dc:date>
    <item>
      <title>Need help with DLT Pipeline</title>
      <link>https://community.databricks.com/t5/data-engineering/need-help-with-dlt-pipeline/m-p/95652#M39140</link>
      <description>&lt;P&gt;I have a DLT pipeline running daily for months and recently found out one issue in my silver layer code and as a result of that, now I have faulty data in my silver schema. Please note that the tables in Silver schema are streaming tables handled within the context of DLT.&amp;nbsp;&lt;BR /&gt;I want to delete the faulty records from my silver schema and keep the pipeline running with the correct code as it's running normally. Could someone please suggest me what would be the best possible approach for me to update my silver schema streaming tables by deleting faulty records? Early response would be highly appreciated.&lt;/P&gt;</description>
      <pubDate>Wed, 23 Oct 2024 07:24:34 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/need-help-with-dlt-pipeline/m-p/95652#M39140</guid>
      <dc:creator>Fatimah-Tariq</dc:creator>
      <dc:date>2024-10-23T07:24:34Z</dc:date>
    </item>
    <item>
      <title>Re: Need help with DLT Pipeline</title>
      <link>https://community.databricks.com/t5/data-engineering/need-help-with-dlt-pipeline/m-p/95661#M39142</link>
      <description>&lt;P&gt;&lt;SPAN&gt;Use Delta Lake's&amp;nbsp;&lt;/SPAN&gt;DELETE&lt;SPAN&gt;&amp;nbsp;command to remove the faulty records from your Silver tables. You can do this in a Databricks notebook or a separate script.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;from pyspark.sql import SparkSession&lt;/P&gt;&lt;P&gt;spark = SparkSession.builder.appName("DeleteFaultyRecords").getOrCreate()&lt;/P&gt;&lt;P&gt;# Define the criteria for faulty records&lt;BR /&gt;faulty_criteria = "your_faulty_criteria_here"&lt;/P&gt;&lt;P&gt;# List of Silver tables to clean&lt;BR /&gt;silver_tables = ["silver_table1", "silver_table2"]&lt;/P&gt;&lt;P&gt;for table in silver_tables:&lt;BR /&gt;spark.sql(f"DELETE FROM {table} WHERE {faulty_criteria}")&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Also, Ensure that your DLT pipeline code is corrected to prevent future faulty records. Update the transformation logic in your Silver layer to handle the data correctly. (spark dataframe transformations or constraints)&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 23 Oct 2024 08:14:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/need-help-with-dlt-pipeline/m-p/95661#M39142</guid>
      <dc:creator>saurabh18cs</dc:creator>
      <dc:date>2024-10-23T08:14:31Z</dc:date>
    </item>
    <item>
      <title>Re: Need help with DLT Pipeline</title>
      <link>https://community.databricks.com/t5/data-engineering/need-help-with-dlt-pipeline/m-p/95666#M39144</link>
      <description>&lt;P&gt;Hi Fatimah,&amp;nbsp;&lt;/P&gt;&lt;P&gt;You can delete the records from silver layer as long as those records don't get reloaded again (from bronze). More info is &lt;A href="https://learn.microsoft.com/en-us/azure/databricks/delta-live-tables/transform#manual-ddl" target="_self"&gt;here&lt;/A&gt;&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;May I also understand that are you are using CDC apply_changes method for loading data to silver (like SCD 1)? If not, definitely above can be done.&lt;/P&gt;&lt;P&gt;Cheers&lt;/P&gt;</description>
      <pubDate>Wed, 23 Oct 2024 08:26:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/need-help-with-dlt-pipeline/m-p/95666#M39144</guid>
      <dc:creator>AngadSingh</dc:creator>
      <dc:date>2024-10-23T08:26:43Z</dc:date>
    </item>
    <item>
      <title>Re: Need help with DLT Pipeline</title>
      <link>https://community.databricks.com/t5/data-engineering/need-help-with-dlt-pipeline/m-p/95668#M39145</link>
      <description>&lt;P&gt;Yes, I'm using CDC&amp;nbsp;&lt;SPAN&gt;apply_changes method with SCD type 1&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 23 Oct 2024 08:33:05 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/need-help-with-dlt-pipeline/m-p/95668#M39145</guid>
      <dc:creator>Fatimah-Tariq</dc:creator>
      <dc:date>2024-10-23T08:33:05Z</dc:date>
    </item>
    <item>
      <title>Re: Need help with DLT Pipeline</title>
      <link>https://community.databricks.com/t5/data-engineering/need-help-with-dlt-pipeline/m-p/96203#M39236</link>
      <description>&lt;P&gt;In that case, what's the expression in the "apply_as_delete" option? Or please share your CDC apply_changes code block?&lt;/P&gt;</description>
      <pubDate>Fri, 25 Oct 2024 21:04:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/need-help-with-dlt-pipeline/m-p/96203#M39236</guid>
      <dc:creator>AngadSingh</dc:creator>
      <dc:date>2024-10-25T21:04:08Z</dc:date>
    </item>
    <item>
      <title>Re: Need help with DLT Pipeline</title>
      <link>https://community.databricks.com/t5/data-engineering/need-help-with-dlt-pipeline/m-p/96381#M39267</link>
      <description>&lt;P&gt;I do not have "apply_as_delete" option.&amp;nbsp;&lt;BR /&gt;Here's my apply_changes code&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;dlt.&lt;/SPAN&gt;&lt;SPAN&gt;apply_changes&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;target&lt;/SPAN&gt; &lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; silver_table,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;source&lt;/SPAN&gt; &lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; bronze_dlt_view,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;keys&lt;/SPAN&gt; &lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; primary_keys,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;sequence_by&lt;/SPAN&gt; &lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;col&lt;/SPAN&gt;&lt;SPAN&gt;(sequence_col),&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;stored_as_scd_type&lt;/SPAN&gt; &lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;1&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;except_column_list&lt;/SPAN&gt; &lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; [&lt;/SPAN&gt;&lt;SPAN&gt;"extract_datetime_utc"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"_rescued_data"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"dlt_extract_datetime_utc"&lt;/SPAN&gt;&lt;SPAN&gt;]&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; )&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Mon, 28 Oct 2024 06:37:29 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/need-help-with-dlt-pipeline/m-p/96381#M39267</guid>
      <dc:creator>Fatimah-Tariq</dc:creator>
      <dc:date>2024-10-28T06:37:29Z</dc:date>
    </item>
  </channel>
</rss>

