<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: DLT - Handling Merge in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/dlt-handling-merge/m-p/109193#M43237</link>
    <description>&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;SPAN&gt;To address the challenges you are facing with your Delta Live Tables (DLT) pipeline, here are some steps and considerations to help you manage the incremental data reading and joining of the Apply Changes table and the streaming live table for SCD Type 1 processing:&lt;/SPAN&gt;&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;SPAN&gt;&lt;STRONG&gt;Incremental Data Reading from Apply Changes Table&lt;/STRONG&gt;:&lt;/SPAN&gt;&lt;/P&gt;
&lt;UL class="_1t7bu9h7 _1t7bu9h2"&gt;
&lt;LI&gt;&lt;SPAN&gt;Ensure that the Apply Changes table is set up to capture changes using the &lt;CODE&gt;APPLY CHANGES&lt;/CODE&gt; API. This API is designed to handle change data capture (CDC) efficiently.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;SPAN&gt;Use the &lt;CODE&gt;apply_changes()&lt;/CODE&gt; function in Python to specify the source, keys, and sequencing for the change feed. This function will help you process changes incrementally.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;STRONG&gt;Handling Out-of-Order Data&lt;/STRONG&gt;:&lt;/P&gt;
&lt;UL class="_1t7bu9h7 _1t7bu9h2"&gt;
&lt;LI&gt;&lt;SPAN&gt;The &lt;CODE&gt;APPLY CHANGES&lt;/CODE&gt; API automatically handles out-of-sequence records, ensuring correct processing of CDC records. You need to specify a column in the source data to sequence records, which Delta Live Tables interprets as a monotonically increasing representation of the proper ordering of the source data.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;STRONG&gt;Joining Tables with Time Lag&lt;/STRONG&gt;:&lt;/P&gt;
&lt;UL class="_1t7bu9h7 _1t7bu9h2"&gt;
&lt;LI&gt;To manage the time lag between the two sources, consider using a watermark to handle late data. This can help ensure that you do not lose records during the join operation.&lt;/LI&gt;
&lt;LI&gt;&lt;SPAN&gt;Use the &lt;CODE&gt;apply_changes()&lt;/CODE&gt; function to create a streaming table and then join it with the streaming live table. This approach ensures that both tables are processed in real-time.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/OL&gt;</description>
    <pubDate>Thu, 06 Feb 2025 13:29:25 GMT</pubDate>
    <dc:creator>Walter_C</dc:creator>
    <dc:date>2025-02-06T13:29:25Z</dc:date>
    <item>
      <title>DLT - Handling Merge</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-handling-merge/m-p/109188#M43235</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;In our DLT pipeline we are reading two tables. One a Apply Changes table Delta table and a streaming live table. We are able to read the latest records from the streaming live table incrementally but from the apply changes we are not able to read the incremental data. We now need to manage joining these two tables in real time to do SCD Type 1 in our target DLT table.&lt;BR /&gt;Challenges: Both the source though has the common records can come with a time lag, so real time inner join we are losing records. Reading the apply changes incremental we are unable to do.&amp;nbsp;&lt;BR /&gt;So how to handle this situation to get the records from both the sources inserted/updated into our target DLT without losing records?&lt;/P&gt;</description>
      <pubDate>Thu, 06 Feb 2025 13:11:52 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-handling-merge/m-p/109188#M43235</guid>
      <dc:creator>JothyGanesan</dc:creator>
      <dc:date>2025-02-06T13:11:52Z</dc:date>
    </item>
    <item>
      <title>Re: DLT - Handling Merge</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-handling-merge/m-p/109193#M43237</link>
      <description>&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;SPAN&gt;To address the challenges you are facing with your Delta Live Tables (DLT) pipeline, here are some steps and considerations to help you manage the incremental data reading and joining of the Apply Changes table and the streaming live table for SCD Type 1 processing:&lt;/SPAN&gt;&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;SPAN&gt;&lt;STRONG&gt;Incremental Data Reading from Apply Changes Table&lt;/STRONG&gt;:&lt;/SPAN&gt;&lt;/P&gt;
&lt;UL class="_1t7bu9h7 _1t7bu9h2"&gt;
&lt;LI&gt;&lt;SPAN&gt;Ensure that the Apply Changes table is set up to capture changes using the &lt;CODE&gt;APPLY CHANGES&lt;/CODE&gt; API. This API is designed to handle change data capture (CDC) efficiently.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;SPAN&gt;Use the &lt;CODE&gt;apply_changes()&lt;/CODE&gt; function in Python to specify the source, keys, and sequencing for the change feed. This function will help you process changes incrementally.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;STRONG&gt;Handling Out-of-Order Data&lt;/STRONG&gt;:&lt;/P&gt;
&lt;UL class="_1t7bu9h7 _1t7bu9h2"&gt;
&lt;LI&gt;&lt;SPAN&gt;The &lt;CODE&gt;APPLY CHANGES&lt;/CODE&gt; API automatically handles out-of-sequence records, ensuring correct processing of CDC records. You need to specify a column in the source data to sequence records, which Delta Live Tables interprets as a monotonically increasing representation of the proper ordering of the source data.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;STRONG&gt;Joining Tables with Time Lag&lt;/STRONG&gt;:&lt;/P&gt;
&lt;UL class="_1t7bu9h7 _1t7bu9h2"&gt;
&lt;LI&gt;To manage the time lag between the two sources, consider using a watermark to handle late data. This can help ensure that you do not lose records during the join operation.&lt;/LI&gt;
&lt;LI&gt;&lt;SPAN&gt;Use the &lt;CODE&gt;apply_changes()&lt;/CODE&gt; function to create a streaming table and then join it with the streaming live table. This approach ensures that both tables are processed in real-time.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/OL&gt;</description>
      <pubDate>Thu, 06 Feb 2025 13:29:25 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-handling-merge/m-p/109193#M43237</guid>
      <dc:creator>Walter_C</dc:creator>
      <dc:date>2025-02-06T13:29:25Z</dc:date>
    </item>
  </channel>
</rss>

