- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-14-2025 04:34 AM
Hi everyone,
I’m building a Medallion-style pipeline on Databricks for nested JSON API responses and I’d like advice on the design of an intermediate “exploded” step:
Can I avoid materializing it as a table?
If not, how should I name/classify it in the Medallion model?
Current pattern (DLT + SQL)
Landing (Volumes)
I extract API responses as JSON files and land them in a Databricks Volume.
Bronze (raw JSON → table)
A DLT pipeline (SQL) reads those JSONs and creates a bronze streaming table.
The goal is to store them “as they are,” just normalizing JSON into columns.
One of the fields is an array of nested records, but in bronze I keep this field as a STRING (the raw JSON text of the array).
Schema evolution and _rescued_data are enabled.
Exploded / normalized (intermediate)
In the next DLT SQL step, I parse that string as an ARRAY with an explicit schema and explode it.
I also flatten nested structs and cast types.
This is currently a streaming table, i.e., a materialized step.
Silver with CDC
On top of the exploded table, I apply a CDC pattern using the ingestion timestamp of the JSON as the sequence column.
For each key (e.g. entity_id, date), I keep only the row with the most recent ingestion timestamp, so I get the latest and most up-to-date metric for that date.
This produces a “clean” silver table with one record per key/date.
So overall:
raw JSON (Volume) → bronze (raw-ish) → exploded/normalized → CDC-consolidated silver
What I’d like to understand
Can I avoid a materialized exploded table?
Concretely, is it possible / a better approach to:Implement the explode + normalization as a materialized view instead of a table?
Use Python in DLT to keep this as a logical/view-like transformation rather than a persisted table?
Push the explode directly into the silver/CDC step, so there is no separate exploded object at all?
(I'm not sure which of these options are actually supported/best in practice, so I’m asking both “is it possible?” and “is it better?”)
If I do keep it, how would you classify/name it in Medallion terms?
The exploded table is:No longer raw JSON (so not pure bronze),
Not yet the final silver state (it’s pre-CDC).
Would you typically treat this as:
a “bronze_normalized” / “bronze_prepared” table,
a generic “staging” layer between bronze and silver, or
a “silver_staging” table that conceptually belongs to silver?
I’m looking for patterns others use for:
Volumes with raw JSON → DLT bronze → explode nested array → CDC silver (latest per key/date)
…both in terms of:
whether to materialize the exploded step at all, and
how to name and position that step in a clean Medallion design.
Thanks in advance for any insights!