cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Need advice for a big source table DLT Pipeline

MauricioS
New Contributor III

Hi all,

I was hoping to get advice from someone with DLT Pipelines, I want to apologize in advance if this is a noob question, I'm really new into DLT, materialized views and streaming tables

I have the following scenario, my source is a big sales delta table (1B+ records) that is being shared with my team via unity catalog and I want to ingest this table into my own catalog with updates every day.

What would the best practice approach to doing this via DLT Pipeline, I ask this because doing my research I still can't wrap my head around incremental loads with DLT pipelines, mostly because I don't want to do a full refresh on a 1B+ record table each day.

Let me know if more details are needed, thanks a lot in advance!

1 REPLY 1

lingareddy_Alva
Honored Contributor II

Hi @MauricioS 

Absolutely not a noob question โ€” you're touching on a common and important challenge in DLT pipelines,
especially when dealing with large shared Delta tables and incremental ingestion from Unity Catalog sources.

Letโ€™s break it down so itโ€™s simple, scalable, and DLT-native.

Ingest from a shared Delta table (Unity Catalog) into your own catalog, incrementally, with daily updates, using DLT.

Best Practice with DLT Pipelines (Incremental Load)

Step 1: Use STREAMING LIVE TABLE to Enable Incremental Load
DLT supports incremental ingestion natively via streaming reads, even if the source table is not a streaming table.
DLT tracks offsets/checkpoints automatically, so you don't reprocess old data.

Step 2: Optional Watermark for Late Records
If you have late-arriving data, you can use watermarks to prevent reprocessing historical rows:

Step 3: Use DLT Expectations for Quality

Step 4: Materialize to Your Catalog
Make sure your DLT pipeline is writing to your own Unity Catalog schema:

DLT Handles Incrementals for You
You donโ€™t need to manually track last_updated_at or store bookmarks โ€” DLT uses checkpoints for streaming sources and only reads new data.
However, your source table must support:
-- Delta format
-- Append or CDC-compatible operations (if using change_data_feed = true)


If Source Supports Change Data Feed (CDF):
Enable CDF if the source table supports it (or ask the upstream team to enable):

 

 

LR

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now