topic Re: How to handle exploded records with overwrite-by-key logic in Delta Live Tables in Data Engineering

How to handle exploded records with overwrite-by-key logic in Delta Live Tables

Dewlap — Fri, 27 Jun 2025 21:39:30 GMT

I'm using Delta Live Tables (DLT) with the apply_changes API to manage SCD Type 1 on a source table. However, I’ve run into a limitation.

Context:

After apply_changes, I have a derived view that:

Flattens and explodes a JSON array field in the source data.
Produces multiple rows per original primary key (key_id).
Needs to be written to a destination Delta table, such that:
- If records already exist for a key_id, they should be fully replaced (delete existing rows and insert the new exploded ones).

Challenge:

apply_changes processes one record at a time, which doesn’t suit my case, where multiple rows need to be updated for one incoming key. This makes it hard to cleanly replace existing groups of records in the target.

Question:

Is there a native way (or recommended pattern) in Databricks/DLT to support this kind of "overwrite-by-key" logic, where we delete all rows by key and insert multiple transformed rows, preferably in a streaming or near-real-time pipeline?

Re: How to handle exploded records with overwrite-by-key logic in Delta Live Tables

Brahmareddy — Fri, 27 Jun 2025 22:07:23 GMT

Hi Dewlap,

How are you doing today? As per my understanding, You're right to notice that apply_changes in DLT works best for one-row-per-key updates and doesn't fit well when you need to replace multiple rows for the same key, especially after exploding arrays. A good workaround is to break this into two steps: first, write your transformed and exploded records into a staging table or view; then, in the next step, delete all existing records from the target table that match the incoming keys and insert the new rows. This "overwrite-by-key" pattern gives you better control and is more suited for your use case. While DLT doesn’t have a native function for this, handling it in a custom logic with a staging layer (especially in a silver-to-gold setup) is a practical and often-used solution. If you're running in streaming mode, you might also consider using a foreachBatch block to manage the deletes and inserts safely. Let me know if you'd like help with a code example.

Regards,

Brahma