<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Medallion Architecture: do I need a materialized “exploded” layer (raw JSON → exploded → CDC)? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/medallion-architecture-do-i-need-a-materialized-exploded-layer/m-p/139090#M51091</link>
    <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/169029"&gt;@guidotognini&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;1. Can You Avoid Materializing Exploded Data?&lt;/P&gt;&lt;P&gt;Materialized Views: If your downstream silver table is itself streaming and supports materialized views, you may be able to collapse the explode+normalize step into a view that directly transforms bronze to silver without a separate persisted table&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;DLT with Python or SQL can chain transformations without persisting each intermediate to storage. However, skipping materialization risks incomplete lineage, difficult debugging, and slower downstream query performance&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;One can push explode+normalize directly into the silver table transformation, with CDC logic applied in a single step. This reduces the number of tables, but may increase complexity and reduce auditability; if you ever need to reprocess or troubleshoot, having the exploded step materialized helps&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Suggested actions:&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Materialize the exploded/normalized step as a managed silver staging table if reliability, traceability, and modularity are priorities.&lt;/LI&gt;&lt;LI&gt;If your pipeline is simple, rarely needs isolated queries on the exploded array, and you don't require stepwise time-travel/debugging, consider using a view or merging normalization into the CDC silver step.&lt;/LI&gt;&lt;LI&gt;Use Medallion naming conventions that clearly indicate the purpose (e.g., "silver_staging" or "silver_normalized").&lt;/LI&gt;&lt;LI&gt;Document lineage to ensure that downstream consumers understand which objects are audited system of record (bronze) and which are computable/corrected intermediates (silver)&lt;A href="https://www.linkedin.com/pulse/flattening-json-data-databricks-downstream-processing-avinash-narala-82zuc/" target="_blank" rel="noopener"&gt;https://www.linkedin.com/pulse/flattening-json-data-databricks-downstream-processing-avinash-narala-82zuc/&lt;/A&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;A href="https://tecyfy.com/blog/medallion-architecture-best-practices-databricks-technical-deep-dive" target="_blank" rel="noopener"&gt;https://tecyfy.com/blog/medallion-architecture-best-practices-databricks-technical-deep-dive&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Fri, 14 Nov 2025 13:35:23 GMT</pubDate>
    <dc:creator>ManojkMohan</dc:creator>
    <dc:date>2025-11-14T13:35:23Z</dc:date>
    <item>
      <title>Medallion Architecture: do I need a materialized “exploded” layer (raw JSON → exploded → CDC)?</title>
      <link>https://community.databricks.com/t5/data-engineering/medallion-architecture-do-i-need-a-materialized-exploded-layer/m-p/139085#M51089</link>
      <description>&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;P&gt;Hi everyone,&lt;/P&gt;&lt;P&gt;I’m building a Medallion-style pipeline on Databricks for &lt;STRONG&gt;nested JSON API responses&lt;/STRONG&gt; and I’d like advice on the design of an intermediate &lt;STRONG&gt;“exploded” step&lt;/STRONG&gt;:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;Can I avoid materializing it as a table?&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;If not, how should I name/classify it in the Medallion model?&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;HR /&gt;&lt;H2&gt;Current pattern (DLT + SQL)&lt;/H2&gt;&lt;OL&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Landing (Volumes)&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;I extract API responses as JSON files and land them in a Databricks &lt;STRONG&gt;Volume&lt;/STRONG&gt;.&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Bronze (raw JSON → table)&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;A &lt;STRONG&gt;DLT pipeline (SQL)&lt;/STRONG&gt; reads those JSONs and creates a &lt;STRONG&gt;bronze streaming table&lt;/STRONG&gt;.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;The goal is to store them “as they are,” just normalizing JSON into columns.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;One of the fields is an &lt;STRONG&gt;array of nested records&lt;/STRONG&gt;, but in bronze I keep this field as a &lt;STRONG&gt;STRING&lt;/STRONG&gt; (the raw JSON text of the array).&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Schema evolution and _rescued_data are enabled.&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Exploded / normalized (intermediate)&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;In the next DLT SQL step, I &lt;STRONG&gt;parse that string as an ARRAY with an explicit schema&lt;/STRONG&gt; and explode it.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;I also flatten nested structs and cast types.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;This is currently a &lt;STRONG&gt;streaming table&lt;/STRONG&gt;, i.e., a materialized step.&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Silver with CDC&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;On top of the exploded table, I apply a &lt;STRONG&gt;CDC pattern&lt;/STRONG&gt; using the &lt;STRONG&gt;ingestion timestamp of the JSON&lt;/STRONG&gt; as the sequence column.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;For each key (e.g. entity_id, date), I keep &lt;STRONG&gt;only the row with the most recent ingestion timestamp&lt;/STRONG&gt;, so I get the &lt;STRONG&gt;latest and most up-to-date metric for that date&lt;/STRONG&gt;.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;This produces a “clean” &lt;STRONG&gt;silver table&lt;/STRONG&gt; with one record per key/date.&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;So overall:&lt;/P&gt;&lt;BLOCKQUOTE&gt;&lt;P&gt;&lt;STRONG&gt;raw JSON (Volume) → bronze (raw-ish) → exploded/normalized → CDC-consolidated silver&lt;/STRONG&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;HR /&gt;&lt;H2&gt;What I’d like to understand&lt;/H2&gt;&lt;OL&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Can I avoid a materialized exploded table?&lt;/STRONG&gt;&lt;BR /&gt;Concretely, is it &lt;STRONG&gt;possible / a better approach&lt;/STRONG&gt; to:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;Implement the explode + normalization as a &lt;STRONG&gt;materialized view&lt;/STRONG&gt; instead of a table?&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Use &lt;STRONG&gt;Python&lt;/STRONG&gt; in DLT to keep this as a logical/view-like transformation rather than a persisted table?&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Push the explode directly into the &lt;STRONG&gt;silver/CDC step&lt;/STRONG&gt;, so there is no separate exploded object at all?&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;(I'm not sure which of these options are actually supported/best in practice, so I’m asking both “is it possible?” and “is it better?”)&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;If I do keep it, how would you classify/name it in Medallion terms?&lt;/STRONG&gt;&lt;BR /&gt;The exploded table is:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;No longer raw JSON (so not pure bronze),&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Not yet the final silver state (it’s pre-CDC).&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;Would you typically treat this as:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;a &lt;STRONG&gt;“bronze_normalized” / “bronze_prepared”&lt;/STRONG&gt; table,&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;a generic &lt;STRONG&gt;“staging”&lt;/STRONG&gt; layer between bronze and silver, or&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;a &lt;STRONG&gt;“silver_staging”&lt;/STRONG&gt; table that conceptually belongs to silver?&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;I’m looking for patterns others use for:&lt;/P&gt;&lt;BLOCKQUOTE&gt;&lt;P&gt;&lt;STRONG&gt;Volumes with raw JSON → DLT bronze → explode nested array → CDC silver (latest per key/date)&lt;/STRONG&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;…both in terms of:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;whether to materialize the exploded step at all, and&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;how to &lt;STRONG&gt;name and position&lt;/STRONG&gt; that step in a clean Medallion design.&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Thanks in advance for any insights!&lt;/P&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;#LakeflowDeclarativePipelines #DLT #MedallionArchitecture&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Fri, 14 Nov 2025 12:34:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/medallion-architecture-do-i-need-a-materialized-exploded-layer/m-p/139085#M51089</guid>
      <dc:creator>guidotognini</dc:creator>
      <dc:date>2025-11-14T12:34:36Z</dc:date>
    </item>
    <item>
      <title>Re: Medallion Architecture: do I need a materialized “exploded” layer (raw JSON → exploded → CDC)?</title>
      <link>https://community.databricks.com/t5/data-engineering/medallion-architecture-do-i-need-a-materialized-exploded-layer/m-p/139090#M51091</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/169029"&gt;@guidotognini&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;1. Can You Avoid Materializing Exploded Data?&lt;/P&gt;&lt;P&gt;Materialized Views: If your downstream silver table is itself streaming and supports materialized views, you may be able to collapse the explode+normalize step into a view that directly transforms bronze to silver without a separate persisted table&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;DLT with Python or SQL can chain transformations without persisting each intermediate to storage. However, skipping materialization risks incomplete lineage, difficult debugging, and slower downstream query performance&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;One can push explode+normalize directly into the silver table transformation, with CDC logic applied in a single step. This reduces the number of tables, but may increase complexity and reduce auditability; if you ever need to reprocess or troubleshoot, having the exploded step materialized helps&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Suggested actions:&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Materialize the exploded/normalized step as a managed silver staging table if reliability, traceability, and modularity are priorities.&lt;/LI&gt;&lt;LI&gt;If your pipeline is simple, rarely needs isolated queries on the exploded array, and you don't require stepwise time-travel/debugging, consider using a view or merging normalization into the CDC silver step.&lt;/LI&gt;&lt;LI&gt;Use Medallion naming conventions that clearly indicate the purpose (e.g., "silver_staging" or "silver_normalized").&lt;/LI&gt;&lt;LI&gt;Document lineage to ensure that downstream consumers understand which objects are audited system of record (bronze) and which are computable/corrected intermediates (silver)&lt;A href="https://www.linkedin.com/pulse/flattening-json-data-databricks-downstream-processing-avinash-narala-82zuc/" target="_blank" rel="noopener"&gt;https://www.linkedin.com/pulse/flattening-json-data-databricks-downstream-processing-avinash-narala-82zuc/&lt;/A&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;A href="https://tecyfy.com/blog/medallion-architecture-best-practices-databricks-technical-deep-dive" target="_blank" rel="noopener"&gt;https://tecyfy.com/blog/medallion-architecture-best-practices-databricks-technical-deep-dive&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 14 Nov 2025 13:35:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/medallion-architecture-do-i-need-a-materialized-exploded-layer/m-p/139090#M51091</guid>
      <dc:creator>ManojkMohan</dc:creator>
      <dc:date>2025-11-14T13:35:23Z</dc:date>
    </item>
  </channel>
</rss>

