cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to handle exploded records with overwrite-by-key logic in Delta Live Tables

Dewlap
New Contributor II

 

I'm using Delta Live Tables (DLT) with the apply_changes API to manage SCD Type 1 on a source table. However, I’ve run into a limitation.

Context:

After apply_changes, I have a derived view that:

  • Flattens and explodes a JSON array field in the source data.

  • Produces multiple rows per original primary key (key_id).

  • Needs to be written to a destination Delta table, such that:

    • If records already exist for a key_id, they should be fully replaced (delete existing rows and insert the new exploded ones).

Challenge:

apply_changes processes one record at a time, which doesn’t suit my case, where multiple rows need to be updated for one incoming key. This makes it hard to cleanly replace existing groups of records in the target.

Question:

Is there a native way (or recommended pattern) in Databricks/DLT to support this kind of "overwrite-by-key" logic, where we delete all rows by key and insert multiple transformed rows, preferably in a streaming or near-real-time pipeline?

1 REPLY 1

Brahmareddy
Esteemed Contributor

Hi Dewlap,

How are you doing today? As per my understanding, You're right to notice that apply_changes in DLT works best for one-row-per-key updates and doesn't fit well when you need to replace multiple rows for the same key, especially after exploding arrays. A good workaround is to break this into two steps: first, write your transformed and exploded records into a staging table or view; then, in the next step, delete all existing records from the target table that match the incoming keys and insert the new rows. This "overwrite-by-key" pattern gives you better control and is more suited for your use case. While DLT doesn’t have a native function for this, handling it in a custom logic with a staging layer (especially in a silver-to-gold setup) is a practical and often-used solution. If you're running in streaming mode, you might also consider using a foreachBatch block to manage the deletes and inserts safely. Let me know if you'd like help with a code example.

Regards,

Brahma

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now