Need advice for a big source table DLT Pipeline

MauricioS — Tue, 13 May 2025 15:38:33 GMT

Hi all,

I was hoping to get advice from someone with DLT Pipelines, I want to apologize in advance if this is a noob question, I'm really new into DLT, materialized views and streaming tables

I have the following scenario, my source is a big sales delta table (1B+ records) that is being shared with my team via unity catalog and I want to ingest this table into my own catalog with updates every day.

What would the best practice approach to doing this via DLT Pipeline, I ask this because doing my research I still can't wrap my head around incremental loads with DLT pipelines, mostly because I don't want to do a full refresh on a 1B+ record table each day.

Let me know if more details are needed, thanks a lot in advance!

Re: Need advice for a big source table DLT Pipeline

lingareddy_Alva — Tue, 13 May 2025 16:04:04 GMT

Hi @MauricioS

Absolutely not a noob question — you're touching on a common and important challenge in DLT pipelines,
especially when dealing with large shared Delta tables and incremental ingestion from Unity Catalog sources.

Let’s break it down so it’s simple, scalable, and DLT-native.

Ingest from a shared Delta table (Unity Catalog) into your own catalog, incrementally, with daily updates, using DLT.

Best Practice with DLT Pipelines (Incremental Load)

Step 1: Use STREAMING LIVE TABLE to Enable Incremental Load
DLT supports incremental ingestion natively via streaming reads, even if the source table is not a streaming table.
DLT tracks offsets/checkpoints automatically, so you don't reprocess old data.

Step 2: Optional Watermark for Late Records
If you have late-arriving data, you can use watermarks to prevent reprocessing historical rows:

Step 3: Use DLT Expectations for Quality

Step 4: Materialize to Your Catalog
Make sure your DLT pipeline is writing to your own Unity Catalog schema:

DLT Handles Incrementals for You
You don’t need to manually track last_updated_at or store bookmarks — DLT uses checkpoints for streaming sources and only reads new data.
However, your source table must support:
-- Delta format
-- Append or CDC-compatible operations (if using change_data_feed = true)

If Source Supports Change Data Feed (CDF):
Enable CDF if the source table supports it (or ask the upstream team to enable):

topic Need advice for a big source table DLT Pipeline in Data Engineering

Need advice for a big source table DLT Pipeline

Re: Need advice for a big source table DLT Pipeline