<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Need advice for a big source table DLT Pipeline in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/need-advice-for-a-big-source-table-dlt-pipeline/m-p/119061#M45783</link>
    <description>&lt;P&gt;Hi all,&lt;/P&gt;&lt;P&gt;I was hoping to get advice from someone with DLT Pipelines, I want to apologize in advance if this is a noob question, I'm really new into DLT, materialized views and streaming tables&lt;/P&gt;&lt;P&gt;I have the following scenario, my source is a big sales delta table (1B+ records) that is being shared with my team via unity catalog and I want to ingest this table into my own catalog with updates every day.&lt;/P&gt;&lt;P&gt;What would the best practice approach to doing this via DLT Pipeline, I ask this because doing my research I still can't wrap my head around incremental loads with DLT pipelines, mostly because I don't want to do a full refresh on a 1B+ record table each day.&lt;/P&gt;&lt;P&gt;Let me know if more details are needed, thanks a lot in advance!&lt;/P&gt;</description>
    <pubDate>Tue, 13 May 2025 15:38:33 GMT</pubDate>
    <dc:creator>MauricioS</dc:creator>
    <dc:date>2025-05-13T15:38:33Z</dc:date>
    <item>
      <title>Need advice for a big source table DLT Pipeline</title>
      <link>https://community.databricks.com/t5/data-engineering/need-advice-for-a-big-source-table-dlt-pipeline/m-p/119061#M45783</link>
      <description>&lt;P&gt;Hi all,&lt;/P&gt;&lt;P&gt;I was hoping to get advice from someone with DLT Pipelines, I want to apologize in advance if this is a noob question, I'm really new into DLT, materialized views and streaming tables&lt;/P&gt;&lt;P&gt;I have the following scenario, my source is a big sales delta table (1B+ records) that is being shared with my team via unity catalog and I want to ingest this table into my own catalog with updates every day.&lt;/P&gt;&lt;P&gt;What would the best practice approach to doing this via DLT Pipeline, I ask this because doing my research I still can't wrap my head around incremental loads with DLT pipelines, mostly because I don't want to do a full refresh on a 1B+ record table each day.&lt;/P&gt;&lt;P&gt;Let me know if more details are needed, thanks a lot in advance!&lt;/P&gt;</description>
      <pubDate>Tue, 13 May 2025 15:38:33 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/need-advice-for-a-big-source-table-dlt-pipeline/m-p/119061#M45783</guid>
      <dc:creator>MauricioS</dc:creator>
      <dc:date>2025-05-13T15:38:33Z</dc:date>
    </item>
    <item>
      <title>Re: Need advice for a big source table DLT Pipeline</title>
      <link>https://community.databricks.com/t5/data-engineering/need-advice-for-a-big-source-table-dlt-pipeline/m-p/119066#M45785</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/144758"&gt;@MauricioS&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Absolutely not a noob question — you're touching on a common and important challenge in DLT pipelines,&lt;BR /&gt;especially when dealing with large shared Delta tables and incremental ingestion from Unity Catalog sources.&lt;/P&gt;&lt;P&gt;Let’s break it down so it’s simple, scalable, and DLT-native.&lt;/P&gt;&lt;P&gt;Ingest from a shared Delta table (Unity Catalog) into your own catalog, incrementally, with daily updates, using DLT.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Best Practice with DLT Pipelines (Incremental Load)&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Step 1: Use STREAMING LIVE TABLE to Enable Incremental Load&lt;/STRONG&gt;&lt;BR /&gt;DLT supports incremental ingestion natively via streaming reads, even if the source table is not a streaming table.&lt;BR /&gt;DLT tracks offsets/checkpoints automatically, so you don't reprocess old data.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Step 2: Optional Watermark for Late Records&lt;/STRONG&gt;&lt;BR /&gt;If you have late-arriving data, you can use watermarks to prevent reprocessing historical rows:&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Step 3: Use DLT Expectations for Quality&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Step 4: Materialize to Your Catalog&lt;/STRONG&gt;&lt;BR /&gt;Make sure your DLT pipeline is writing to your own Unity Catalog schema:&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;DLT Handles Incrementals for You&lt;/STRONG&gt;&lt;BR /&gt;You don’t need to manually track last_updated_at or store bookmarks — DLT uses checkpoints for streaming sources and only reads new data.&lt;BR /&gt;However, your source table must support:&lt;BR /&gt;-- Delta format&lt;BR /&gt;-- Append or CDC-compatible operations (if using change_data_feed = true)&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;If Source Supports Change Data Feed (CDF):&lt;BR /&gt;Enable CDF if the source table supports it (or ask the upstream team to enable):&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 13 May 2025 16:04:04 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/need-advice-for-a-big-source-table-dlt-pipeline/m-p/119066#M45785</guid>
      <dc:creator>lingareddy_Alva</dc:creator>
      <dc:date>2025-05-13T16:04:04Z</dc:date>
    </item>
  </channel>
</rss>

