<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Incremental Load Without any keys in Get Started Discussions</title>
    <link>https://community.databricks.com/t5/get-started-discussions/incremental-load-without-any-keys/m-p/157084#M11784</link>
    <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/229992"&gt;@Rupa0503&lt;/a&gt;&amp;nbsp;The correct approach is to have a key identifier to match and merge the records. You could also do a full load (expensive but works).&amp;nbsp;Databricks official recommendation is Delta Change Data Feed.&lt;/P&gt;</description>
    <pubDate>Sun, 17 May 2026 06:11:02 GMT</pubDate>
    <dc:creator>Sumit_7</dc:creator>
    <dc:date>2026-05-17T06:11:02Z</dc:date>
    <item>
      <title>Incremental Load Without any keys</title>
      <link>https://community.databricks.com/t5/get-started-discussions/incremental-load-without-any-keys/m-p/157051#M11783</link>
      <description>&lt;P&gt;so am performing incremental load where i want to insert or update data but there are no date columns or any keys how can i do the incremental load from silver to gold layer&lt;/P&gt;</description>
      <pubDate>Sat, 16 May 2026 17:11:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/incremental-load-without-any-keys/m-p/157051#M11783</guid>
      <dc:creator>Rupa0503</dc:creator>
      <dc:date>2026-05-16T17:11:16Z</dc:date>
    </item>
    <item>
      <title>Re: Incremental Load Without any keys</title>
      <link>https://community.databricks.com/t5/get-started-discussions/incremental-load-without-any-keys/m-p/157084#M11784</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/229992"&gt;@Rupa0503&lt;/a&gt;&amp;nbsp;The correct approach is to have a key identifier to match and merge the records. You could also do a full load (expensive but works).&amp;nbsp;Databricks official recommendation is Delta Change Data Feed.&lt;/P&gt;</description>
      <pubDate>Sun, 17 May 2026 06:11:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/incremental-load-without-any-keys/m-p/157084#M11784</guid>
      <dc:creator>Sumit_7</dc:creator>
      <dc:date>2026-05-17T06:11:02Z</dc:date>
    </item>
    <item>
      <title>Re: Incremental Load Without any keys</title>
      <link>https://community.databricks.com/t5/get-started-discussions/incremental-load-without-any-keys/m-p/157091#M11785</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/229992"&gt;@Rupa0503&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;You can follow below&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;FONT size="3"&gt;&lt;STRONG&gt;Key Hashing &amp;amp; Merge&lt;/STRONG&gt;&amp;nbsp;- Y&lt;/FONT&gt;&lt;FONT size="3"&gt;ou can create a synthetic key by concatenating all data columns in a row and applying a hashing algorithm (MD5 or SHA-2) within the Silver layer.&amp;nbsp;&lt;/FONT&gt;&lt;FONT size="3"&gt;The hash acts as a unique fingerprint for that exact snapshot of data. When moving data to Gold, you can use a MERGE operation matching solely on this generated hash. If the hash already exists in Gold, the row can be ignored as a duplicate. If it does not exist, you can treat it as a new entry. If an upstream entity changes, it generates a brand new hash which naturally appends to Gold as a new row version preserving historical states without requiring a traditional update key.&lt;/FONT&gt;&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;&lt;FONT size="3"&gt;Structured Streaming -&amp;nbsp;&lt;/FONT&gt;&lt;/STRONG&gt;&lt;FONT size="3"&gt;If your Silver layer is strictly append-only (new data arrives continuously but existing records are never modified or rewritten upstream), you can bypass batch processing entirely in favor of Structured Streaming.&amp;nbsp;&lt;/FONT&gt;&lt;FONT size="3"&gt;Databricks Structured Streaming utilizes internal metadata checkpoints to track exactly which data files have been processed. The engine automatically handles the incremental logic behind the scenes by reading only the new files populated in the Silver table since the last trigger. Because the framework relies on file level tracking rather than data/column level, it successfully pushes incremental updates to Gold without needing a key or timestamp.&lt;/FONT&gt;&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;&lt;FONT size="3"&gt;Change Data Feed (CDF) - &lt;/FONT&gt;&lt;/STRONG&gt;&lt;FONT size="3"&gt;If&lt;/FONT&gt;&lt;FONT size="3"&gt;&amp;nbsp;the Silver layer does experience modifications (such as overwrites or merges) and you cannot rely on a simple append stream, you can enable Delta Lake’s Change Data Feed (CDF) on the Silver table.&amp;nbsp;&lt;/FONT&gt;&lt;FONT size="3"&gt;CDF automatically isolates row level mutations by exposing metadata columns (whether a row is a new insertion or the post image of an update). By streaming from the Change Data Feed, you can pull the exact rows that changed during the latest commit minimizing data scanning costs, allowing to feed a set of modified rows into your Gold logic without scanning full data.&lt;/FONT&gt;&lt;/LI&gt;&lt;/UL&gt;</description>
      <pubDate>Sun, 17 May 2026 08:29:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/incremental-load-without-any-keys/m-p/157091#M11785</guid>
      <dc:creator>balajij8</dc:creator>
      <dc:date>2026-05-17T08:29:02Z</dc:date>
    </item>
    <item>
      <title>Re: Incremental Load Without any keys</title>
      <link>https://community.databricks.com/t5/get-started-discussions/incremental-load-without-any-keys/m-p/157711#M11802</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/229992"&gt;@Rupa0503&lt;/a&gt;&amp;nbsp;Watermark doesn't need a date column from silver/gold. You can use pipeline run timestamp as the watermark — tracking when the pipeline last ran, not when data was modified. Also you can use&amp;nbsp;Delta table version history within watermark&amp;nbsp; as no date column will be needed .&lt;/P&gt;&lt;P&gt;Second ,&amp;nbsp;you can use Delta Change Data Feed - best for no date column and row level changes .&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 27 May 2026 08:00:46 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/incremental-load-without-any-keys/m-p/157711#M11802</guid>
      <dc:creator>pragya17</dc:creator>
      <dc:date>2026-05-27T08:00:46Z</dc:date>
    </item>
  </channel>
</rss>

