Incremental Load Without any keys

Rupa0503 — Sat, 16 May 2026 17:11:16 GMT

so am performing incremental load where i want to insert or update data but there are no date columns or any keys how can i do the incremental load from silver to gold layer

Re: Incremental Load Without any keys

Sumit_7 — Sun, 17 May 2026 06:11:02 GMT

@Rupa0503 The correct approach is to have a key identifier to match and merge the records. You could also do a full load (expensive but works). Databricks official recommendation is Delta Change Data Feed.

Re: Incremental Load Without any keys

balajij8 — Sun, 17 May 2026 08:29:02 GMT

@Rupa0503

You can follow below

Key Hashing & Merge - You can create a synthetic key by concatenating all data columns in a row and applying a hashing algorithm (MD5 or SHA-2) within the Silver layer. The hash acts as a unique fingerprint for that exact snapshot of data. When moving data to Gold, you can use a MERGE operation matching solely on this generated hash. If the hash already exists in Gold, the row can be ignored as a duplicate. If it does not exist, you can treat it as a new entry. If an upstream entity changes, it generates a brand new hash which naturally appends to Gold as a new row version preserving historical states without requiring a traditional update key.
Structured Streaming - If your Silver layer is strictly append-only (new data arrives continuously but existing records are never modified or rewritten upstream), you can bypass batch processing entirely in favor of Structured Streaming. Databricks Structured Streaming utilizes internal metadata checkpoints to track exactly which data files have been processed. The engine automatically handles the incremental logic behind the scenes by reading only the new files populated in the Silver table since the last trigger. Because the framework relies on file level tracking rather than data/column level, it successfully pushes incremental updates to Gold without needing a key or timestamp.
Change Data Feed (CDF) - If the Silver layer does experience modifications (such as overwrites or merges) and you cannot rely on a simple append stream, you can enable Delta Lake’s Change Data Feed (CDF) on the Silver table. CDF automatically isolates row level mutations by exposing metadata columns (whether a row is a new insertion or the post image of an update). By streaming from the Change Data Feed, you can pull the exact rows that changed during the latest commit minimizing data scanning costs, allowing to feed a set of modified rows into your Gold logic without scanning full data.

Re: Incremental Load Without any keys

pragya17 — Wed, 27 May 2026 08:00:46 GMT

@Rupa0503 Watermark doesn't need a date column from silver/gold. You can use pipeline run timestamp as the watermark — tracking when the pipeline last ran, not when data was modified. Also you can use Delta table version history within watermark as no date column will be needed .

Second , you can use Delta Change Data Feed - best for no date column and row level changes .

topic Re: Incremental Load Without any keys in Get Started Discussions

Incremental Load Without any keys

Re: Incremental Load Without any keys

Re: Incremental Load Without any keys

Re: Incremental Load Without any keys