<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Complex Json file Flatten Dynamically in Databricks Free Edition Help</title>
    <link>https://community.databricks.com/t5/databricks-free-edition-help/complex-json-file-flatten-dynamically/m-p/150587#M703</link>
    <description>&lt;P&gt;Thanks for your response! Could you please share a high-level sudo code example that explains the config table structure and how to define patterns in a Delta table? Additionally, I'd like to understand how to flatten complex JSON data using a metadata-driven approach.&lt;/P&gt;&lt;P&gt;Currently, I've implemented a flattening logic using a recursive function that processes the entire JSON file, but it's not working as expected for some complex scenarios. I'd also like to restrict the child tables, as not all scenarios require all tables, and this should be a config-driven approach.&lt;/P&gt;&lt;P&gt;If you have any reference sudo code or approach, it would be really helpful. Thanks in advance!&lt;/P&gt;</description>
    <pubDate>Wed, 11 Mar 2026 12:15:03 GMT</pubDate>
    <dc:creator>SantiNath_Dey</dc:creator>
    <dc:date>2026-03-11T12:15:03Z</dc:date>
    <item>
      <title>Complex Json file Flatten Dynamically</title>
      <link>https://community.databricks.com/t5/databricks-free-edition-help/complex-json-file-flatten-dynamically/m-p/150367#M701</link>
      <description>&lt;P&gt;Hi ,&lt;/P&gt;&lt;DIV&gt;We have multiple complex JSON files and need to flatten them, especially handling &lt;STRONG&gt;&lt;STRONG&gt;array data types. Whenever an array is present, we need to create a &lt;STRONG&gt;new child table and establish a &lt;STRONG&gt;relationship between the master and child tables. This will follow a &lt;STRONG&gt;metadata-driven architecture. Additionally, we have &lt;STRONG&gt;different JSON formats, so the flattening logic must also work &lt;STRONG&gt;based on patterns. Could you please help with this? It’s an advanced requirement, and your support would be greatly appreciated.&lt;/STRONG&gt;&lt;/STRONG&gt;&lt;/STRONG&gt;&lt;/STRONG&gt;&lt;/STRONG&gt;&lt;/STRONG&gt;&lt;/STRONG&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;P&gt;Pattern A — Customer → Orders (array) → OrderLines (nested array)&lt;BR /&gt;{&lt;BR /&gt;&amp;nbsp; "sourceSystem": "CDS",&lt;BR /&gt;&amp;nbsp; "ingestionTs": "2026-03-09T09:20:10Z",&lt;BR /&gt;&amp;nbsp; "customer": {&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; "customerId": "CUST-1001",&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; "name": "Amit Das",&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; "email": "amit.das@example.com",&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; "address": {&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "city": "Kolkata",&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "state": "WB",&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "pin": "700001"&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; },&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; "phones": [&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; { "type": "mobile", "number": "+91-9000000001" },&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; { "type": "office", "number": "+91-3300000002" }&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; ],&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; "orders": [&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "orderId": "ORD-501",&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "orderDate": "2026-03-01",&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "status": "SHIPPED",&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "orderLines": [&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; { "lineId": 1, "sku": "SKU-01", "qty": 2, "price": 120.5 },&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; { "lineId": 2, "sku": "SKU-02", "qty": 1, "price": 999.0 }&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ],&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "payments": [&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; { "paymentId": "PAY-901", "method": "UPI", "amount": 1240.0 }&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ]&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; },&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "orderId": "ORD-502",&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "orderDate": "2026-03-05",&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "status": "CREATED",&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "orderLines": [&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; { "lineId": 1, "sku": "SKU-03", "qty": 3, "price": 10.0 }&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ],&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "payments": []&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; ]&lt;BR /&gt;&amp;nbsp; }&lt;BR /&gt;}&lt;BR /&gt;``&lt;BR /&gt;Tables typically created:&lt;/P&gt;&lt;P&gt;customer_master (PK: customerId)&lt;BR /&gt;customer_phones (FK: customerId)&lt;BR /&gt;customer_orders (PK: orderId, FK: customerId)&lt;BR /&gt;order_lines (FK: orderId)&lt;BR /&gt;order_payments (FK: orderId)&lt;/P&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Mon, 09 Mar 2026 07:16:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/databricks-free-edition-help/complex-json-file-flatten-dynamically/m-p/150367#M701</guid>
      <dc:creator>SantiNath_Dey</dc:creator>
      <dc:date>2026-03-09T07:16:58Z</dc:date>
    </item>
    <item>
      <title>Re: Complex Json file Flatten Dynamically</title>
      <link>https://community.databricks.com/t5/databricks-free-edition-help/complex-json-file-flatten-dynamically/m-p/150561#M702</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/219024"&gt;@SantiNath_Dey&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P&gt;Given you’re dealing with multiple complex JSON structures, I agree with a&amp;nbsp;metadata‑driven approach rather than&amp;nbsp;hard‑coding the flattening logic.&lt;/P&gt;
&lt;P&gt;At a high level:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;P&gt;Define a small set of config tables:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;One for patterns... as in which JSON shape/source you’re handling, with a detection rule.&lt;/LI&gt;
&lt;LI&gt;One for entities per pattern... which relational tables you want.. for example, customer, orders, order_lines, etc.&lt;/LI&gt;
&lt;LI&gt;Perhaps another one for the entity explode paths?&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Build a generic flattening/orchestration layer that:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;Reads the raw JSON.&lt;/LI&gt;
&lt;LI&gt;Uses the pattern config to pick the right pattern.&lt;/LI&gt;
&lt;LI&gt;For each entity in that pattern, follow the configured explode paths (arrays --&amp;gt; child tables) and select the configured columns, including parent keys as FKs.&lt;/LI&gt;
&lt;/OL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;The initial config could feel a bit heavier, but it scales really well. When a new JSON pattern shows up, you only add rows to the config tables (new pattern, entities, and exploded paths). You don’t touch the flattening/orchestration code at all.&lt;/P&gt;
&lt;P&gt;Does that give you a rough idea to start with?&lt;/P&gt;
&lt;P class="p1"&gt;&lt;FONT size="2" color="#FF6600"&gt;&lt;STRONG&gt;&lt;I&gt;If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.&lt;/I&gt;&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 11 Mar 2026 09:11:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/databricks-free-edition-help/complex-json-file-flatten-dynamically/m-p/150561#M702</guid>
      <dc:creator>Ashwin_DSA</dc:creator>
      <dc:date>2026-03-11T09:11:20Z</dc:date>
    </item>
    <item>
      <title>Re: Complex Json file Flatten Dynamically</title>
      <link>https://community.databricks.com/t5/databricks-free-edition-help/complex-json-file-flatten-dynamically/m-p/150587#M703</link>
      <description>&lt;P&gt;Thanks for your response! Could you please share a high-level sudo code example that explains the config table structure and how to define patterns in a Delta table? Additionally, I'd like to understand how to flatten complex JSON data using a metadata-driven approach.&lt;/P&gt;&lt;P&gt;Currently, I've implemented a flattening logic using a recursive function that processes the entire JSON file, but it's not working as expected for some complex scenarios. I'd also like to restrict the child tables, as not all scenarios require all tables, and this should be a config-driven approach.&lt;/P&gt;&lt;P&gt;If you have any reference sudo code or approach, it would be really helpful. Thanks in advance!&lt;/P&gt;</description>
      <pubDate>Wed, 11 Mar 2026 12:15:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/databricks-free-edition-help/complex-json-file-flatten-dynamically/m-p/150587#M703</guid>
      <dc:creator>SantiNath_Dey</dc:creator>
      <dc:date>2026-03-11T12:15:03Z</dc:date>
    </item>
    <item>
      <title>Re: Complex Json file Flatten Dynamically</title>
      <link>https://community.databricks.com/t5/databricks-free-edition-help/complex-json-file-flatten-dynamically/m-p/150633#M709</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/219024"&gt;@SantiNath_Dey&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;I may not be able to deliver production-grade working code, but I'll walk you through my approach. It's a long post, but no point giving you a concise response that doesn't convey the message.&lt;/P&gt;
&lt;P&gt;Let's start with the architecture.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-center" image-alt="Flattener_Architecture.jpg" style="width: 999px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/24729i27F32B8C4A9F76C4/image-size/large?v=v2&amp;amp;px=999" role="button" title="Flattener_Architecture.jpg" alt="Flattener_Architecture.jpg" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;SPAN class="s1"&gt;The framework has three layers, all driven by config.&lt;SPAN class="Apple-converted-space"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI class="p2"&gt;&lt;SPAN class="s1"&gt;&lt;SPAN class="Apple-converted-space"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;FONT color="#CC99FF"&gt;&lt;STRONG&gt;Detection Layer&lt;/STRONG&gt;&lt;/FONT&gt;... When raw JSON arrives (from a UC Volume or a Bronze layer), the engine reads pattern_config and evaluates each detection_rule (a SQL expression like &lt;SPAN&gt;sourceSystem = 'CDS' AND customer IS NOT NULL&lt;/SPAN&gt;&lt;SPAN&gt;) against the DataFrame. First match wins. This identifies which pattern to apply&amp;nbsp;&lt;/SPAN&gt;and whether to hoist a nested struct as the root (e.g.,&amp;nbsp;&lt;SPAN&gt;customer&lt;/SPAN&gt;&lt;SPAN&gt;).&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI class="p2"&gt;&lt;STRONG&gt;&lt;FONT color="#CC99FF"&gt;Orchestration &lt;/FONT&gt;&lt;/STRONG&gt;&lt;SPAN&gt;&lt;STRONG&gt;&lt;FONT color="#CC99FF"&gt;Layer.&lt;/FONT&gt;&lt;/STRONG&gt;..&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;Once the pattern is identified, the engine pulls all entities from &lt;/SPAN&gt;&lt;SPAN&gt;entity_config&lt;/SPAN&gt;&lt;SPAN&gt; for that &lt;/SPAN&gt;&lt;SPAN&gt;pattern_id&lt;/SPAN&gt;&lt;SPAN&gt;, ordered by&amp;nbsp;&lt;/SPAN&gt;entity_order&lt;SPAN&gt;. This guarantees parents are processed before children. Each entity row defines the target table name, PK (natural or&amp;nbsp;&lt;/SPAN&gt;auto-generated), FK, and its parent in the hierarchy.&lt;/LI&gt;
&lt;LI class="p2"&gt;&lt;FONT color="#CC99FF"&gt;&lt;STRONG&gt;Flattening &lt;/STRONG&gt;&lt;/FONT&gt;&lt;SPAN&gt;&lt;FONT color="#CC99FF"&gt;&lt;STRONG&gt;Layer.&lt;/STRONG&gt;&lt;/FONT&gt;..&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;For each entity, the engine reads &lt;/SPAN&gt;&lt;SPAN&gt;entity_explode_config&lt;/SPAN&gt;&lt;SPAN&gt; to get the explode plan. It walks the &lt;/SPAN&gt;&lt;SPAN&gt;explode_path&lt;/SPAN&gt;&lt;SPAN&gt; (e.g.,&lt;/SPAN&gt;&amp;nbsp;&lt;SPAN&gt;orders.orderLines&lt;/SPAN&gt;&lt;SPAN&gt;), exploding each intermediate array segment, flattens any configured structs inline (e.g.,&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;address&lt;/SPAN&gt; &lt;SPAN&gt;-&amp;gt;&lt;/SPAN&gt; &lt;SPAN&gt;city,&lt;/SPAN&gt; &lt;SPAN&gt;state,&lt;/SPAN&gt; &lt;SPAN&gt;pin&lt;/SPAN&gt;&lt;SPAN&gt;), selects&amp;nbsp;&lt;/SPAN&gt;only the configured columns, and injects the FK from the parent entity. The resulting DataFrame is deduplicated and registered as an output table.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;For the output, all entity DataFrames are written as Delta tables to the target Unity Catalog schema. A _metadata_catalog table is also generated. One&amp;nbsp;row per column across all output tables, capturing table name, column name, data type, PK/FK flags, and FK references. This serves as a built-in&amp;nbsp;lineage/governance layer.&lt;/P&gt;
&lt;P&gt;The key point is that the three config tables at the top of the diagram are the only things that change when a new JSON format arrives. The&amp;nbsp;engine on the left (the three layers I've described above) never changes.&lt;/P&gt;
&lt;P&gt;You asked about how the config tables should be set up. The three config tables will look something like the below. This is just an example. Modify it as required.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Pattern_Config:&amp;nbsp;&lt;/STRONG&gt;&lt;SPAN&gt;One row per JSON shape. The detection_rule is a SQL boolean expression evaluated against the raw DataFrame to auto-match the right pattern.&lt;/SPAN&gt;&lt;/P&gt;
&lt;TABLE&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD width="118.594px" height="50px"&gt;
&lt;P&gt;&lt;STRONG&gt;Column&lt;/STRONG&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD width="71.4453px" height="50px"&gt;
&lt;P&gt;&lt;STRONG&gt;Type&lt;/STRONG&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD width="534.211px" height="50px"&gt;
&lt;P&gt;&lt;STRONG&gt;Purpose&lt;/STRONG&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD width="118.594px" height="50px"&gt;
&lt;P&gt;&lt;SPAN&gt;pattern_id&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD width="71.4453px" height="50px"&gt;
&lt;P&gt;&lt;SPAN&gt;STRING&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD width="534.211px" height="50px"&gt;
&lt;P&gt;&lt;SPAN&gt;Unique ID (e.g. P001)&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD width="118.594px" height="50px"&gt;
&lt;P&gt;&lt;SPAN&gt;pattern_name&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD width="71.4453px" height="50px"&gt;
&lt;P&gt;&lt;SPAN&gt;STRING&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD width="534.211px" height="50px"&gt;
&lt;P&gt;&lt;SPAN&gt;Human-readable name&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD width="118.594px" height="50px"&gt;
&lt;P&gt;&lt;SPAN&gt;source_system&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD width="71.4453px" height="50px"&gt;
&lt;P&gt;&lt;SPAN&gt;STRING&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD width="534.211px" height="50px"&gt;
&lt;P&gt;&lt;SPAN&gt;Origin system identifier&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD width="118.594px" height="50px"&gt;
&lt;P&gt;&lt;SPAN&gt;detection_rule&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD width="71.4453px" height="50px"&gt;
&lt;P&gt;&lt;SPAN&gt;STRING&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD width="534.211px" height="50px"&gt;
&lt;P&gt;&lt;SPAN&gt;SQL expression, e.g. sourceSystem = 'CDS' AND customer IS NOT NULL&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD width="118.594px" height="50px"&gt;
&lt;P&gt;&lt;SPAN&gt;root_path&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD width="71.4453px" height="50px"&gt;
&lt;P&gt;&lt;SPAN&gt;STRING&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD width="534.211px" height="50px"&gt;
&lt;P&gt;&lt;SPAN&gt;Struct to hoist as root (e.g. customer), or NULL&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD width="118.594px" height="50px"&gt;
&lt;P&gt;&lt;SPAN&gt;description&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD width="71.4453px" height="50px"&gt;
&lt;P&gt;&lt;SPAN&gt;STRING&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD width="534.211px" height="50px"&gt;
&lt;P&gt;&lt;SPAN&gt;Free text&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;P&gt;&lt;STRONG&gt;Entity Config:&amp;nbsp;&lt;/STRONG&gt;One row per target table. Defines the relational model...parent-child hierarchy, keys, and processing order.&lt;/P&gt;
&lt;TABLE&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD&gt;
&lt;P&gt;&lt;STRONG&gt;Column&lt;/STRONG&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;STRONG&gt;Type&lt;/STRONG&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;STRONG&gt;Purpose&lt;/STRONG&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;pattern_id&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;STRING&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;Links to pattern_config&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;entity_id&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;STRING&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;Unique ID (e.g. E001)&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;entity_name&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;STRING&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;Target Delta table name&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;parent_entity_id&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;STRING&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;Parent entity (NULL for root)&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;pk_column&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;STRING&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;Primary key column name&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;pk_auto_generate&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;BOOLEAN&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;True = engine generates surrogate PK&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;fk_column&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;STRING&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;Foreign key column name&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;fk_references&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;STRING&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;Parent table.column reference&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;entity_order&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;INT&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;Processing sequence (parents first)&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;P&gt;&lt;STRONG&gt;Entity Explode Config:&amp;nbsp;&lt;/STRONG&gt;&lt;SPAN&gt;One row per entity. Tells the engine how to navigate the JSON tree and which columns to extract.&lt;/SPAN&gt;&lt;/P&gt;
&lt;TABLE&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD&gt;
&lt;P&gt;&lt;STRONG&gt;Column&lt;/STRONG&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;STRONG&gt;Type&lt;/STRONG&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;STRONG&gt;Purpose&lt;/STRONG&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;entity_id&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;STRING&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;Links to entity_config&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;explode_path&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;STRING&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;Dot-path to the array (e.g. orders.orderLines), NULL for root&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;explode_type&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;STRING&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;root, array_of_struct, or nested_array&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;select_columns&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;STRING&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;Comma-separated column list for this entity&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;flatten_structs&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;STRING&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;Struct flattening rule: struct_col -&amp;gt; field1, field2&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;alias_map&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;STRING&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;Column renames (optional)&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;P&gt;Let's now talk about the ERD for the example you gave. I'm using your example as the basis for the demonstration.&amp;nbsp;&lt;SPAN class="s1"&gt;I'm creating 5 normalised tables from a single nested JSON document as you wanted.&amp;nbsp;&lt;SPAN class="Apple-converted-space"&gt;&amp;nbsp; &lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI class="p1"&gt;&lt;SPAN class="s1"&gt;&lt;STRONG&gt;customer_master&lt;/STRONG&gt; is the root table, keyed on &lt;/SPAN&gt;&lt;SPAN class="s2"&gt;customerId&lt;/SPAN&gt;&lt;SPAN class="s1"&gt;. It holds the scalar customer fields plus the flattened address struct (&lt;/SPAN&gt;&lt;SPAN class="s2"&gt;address__city&lt;/SPAN&gt;&lt;SPAN class="s1"&gt;,&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="s1"&gt;address__state&lt;/SPAN&gt;&lt;SPAN class="s3"&gt;, &lt;/SPAN&gt;&lt;SPAN class="s1"&gt;address__pin&lt;/SPAN&gt;&lt;SPAN class="s3"&gt;) and the top-level metadata columns (&lt;/SPAN&gt;&lt;SPAN class="s1"&gt;sourceSystem&lt;/SPAN&gt;&lt;SPAN class="s3"&gt;, &lt;/SPAN&gt;&lt;SPAN class="s1"&gt;ingestionTs&lt;/SPAN&gt;&lt;SPAN class="s3"&gt;).&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI class="p1"&gt;&lt;SPAN class="s1"&gt;&lt;STRONG&gt;customer_phones&lt;/STRONG&gt; and &lt;STRONG&gt;customer_orders&lt;/STRONG&gt; are direct children of customer_master, both linked via &lt;/SPAN&gt;&lt;SPAN class="s2"&gt;customerId&lt;/SPAN&gt;&lt;SPAN class="s1"&gt; as FK. Phones has an auto-generated&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="s1"&gt;surrogate PK since the source array has no natural key. Orders use&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="s2"&gt;orderId&lt;/SPAN&gt;&lt;SPAN class="s1"&gt; as a natural PK.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI class="p1"&gt;&lt;SPAN class="s1"&gt;&lt;STRONG&gt;order_lines&lt;/STRONG&gt; and &lt;STRONG&gt;order_payments&lt;/STRONG&gt; sit one level deeper, both linked to customer_orders via &lt;/SPAN&gt;&lt;SPAN class="s2"&gt;orderId&lt;/SPAN&gt;&lt;SPAN class="s1"&gt;. These come from nested arrays &lt;I&gt;within&lt;/I&gt; the&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="s1"&gt;orders array. When you run the script, the engine walks the dotted path &lt;/SPAN&gt;&lt;SPAN class="s2"&gt;orders.orderLines&lt;/SPAN&gt;&lt;SPAN class="s1"&gt; and &lt;/SPAN&gt;&lt;SPAN class="s2"&gt;orders. payments&lt;/SPAN&gt;&lt;SPAN class="s1"&gt; automatically, exploding each intermediate array along the&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="s1"&gt;way.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="p1"&gt;&lt;SPAN class="s1"&gt;All relationships are 1-to-many (one customer has many phones/orders, one order has many lines/payments). The PK/FK pairs are fully defined in&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="s2"&gt;entity_config&lt;/SPAN&gt;&lt;SPAN class="s1"&gt;, so the same ER structure can be reproduced from config alone... no hardcoded joins in the engine.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-center" image-alt="Flattener_ERD.jpg" style="width: 697px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/24731i838A2E32C2E4A81F/image-size/large?v=v2&amp;amp;px=999" role="button" title="Flattener_ERD.jpg" alt="Flattener_ERD.jpg" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;From a coding perspective, here is how I'll approach it. This is just pseudo code as requested. Use this as reference to build what you need.&lt;/P&gt;
&lt;P&gt;Step 1 is to load the config..&lt;/P&gt;
&lt;PRE&gt;&lt;SPAN&gt;patterns&amp;nbsp; &amp;nbsp; = read_table("pattern_config")&amp;nbsp; &amp;nbsp; &amp;nbsp; -- all patterns&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;entities&amp;nbsp; &amp;nbsp; = read_table("entity_config")&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; -- ordered by entity_order&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;explode_cfg = read_table("entity_explode_config") -- keyed by entity_id&lt;/SPAN&gt;&lt;/PRE&gt;
&lt;H3&gt;&amp;nbsp;&lt;/H3&gt;
&lt;P&gt;Step 2 is to read the JSON and detect the pattern.&lt;/P&gt;
&lt;PRE&gt;&lt;SPAN&gt;df_raw = spark.read.json(input_path)&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;FOR each pattern IN patterns:&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;IF df_raw.filter(expr(pattern.detection_rule)).count &amp;gt; 0:&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;matched_pattern = pattern&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;BREAK&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;IF no match:&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;RAISE "No pattern matched"&lt;/SPAN&gt;&lt;/PRE&gt;
&lt;P&gt;Step 3 is to apply root path&lt;/P&gt;
&lt;PRE&gt;&lt;SPAN&gt;IF matched_pattern.root_path IS NOT NULL:&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;-- Preserve top-level scalar metadata (sourceSystem, ingestionTs)&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;meta_cols = [scalar columns that are NOT the root_path struct]&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;df_hoisted = df_raw.select(meta_cols, root_path.*)&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;ELSE:&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;df_hoisted = df_raw&lt;/SPAN&gt;&lt;/PRE&gt;
&lt;P&gt;Step 4 is where you process each entity..&lt;/P&gt;
&lt;H3&gt;&amp;nbsp;&lt;/H3&gt;
&lt;PRE&gt;&lt;SPAN&gt;FOR each entity IN entities WHERE entity.pattern_id = matched_pattern.id:&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;exp = explode_cfg[entity.entity_id]&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;-- 4a. Resolve explode context&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;IF exp.explode_type == "root":&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;working_df = df_hoisted&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;ELSE:&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;working_df = df_hoisted&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;FOR each segment IN exp.explode_path.split("."):&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;IF segment is ArrayType:&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;working_df = working_df.explode_outer(segment)&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;-- expand struct fields if element is struct&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;ELIF segment is StructType:&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;working_df = working_df.select(segment.*)&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;-- 4b. Flatten structs (if configured)&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;IF exp.flatten_structs IS NOT NULL:&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;parse "struct_col -&amp;gt; field1, field2, field3"&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;FOR each field:&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;working_df = working_df.withColumn(&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"struct_col__field", col("struct_col.field")&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;)&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;-- 4c. Select configured columns + FK&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;final_cols = []&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;IF entity.fk_column:&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;final_cols.append(entity.fk_column)&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;final_cols.extend(parse(exp.select_columns))&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;entity_df = working_df.select(final_cols)&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;-- 4d. Auto-generate PK if needed&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;IF entity.pk_auto_generate:&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;entity_df = entity_df.withColumn(&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;entity.pk_column,&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;concat(fk_col, "_", monotonically_increasing_id)&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;)&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;-- 4e. Deduplicate and register&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;output_tables[entity.entity_name] = entity_df.dropDuplicates()&lt;/SPAN&gt;&lt;/PRE&gt;
&lt;P&gt;Step is where you write the output..&lt;/P&gt;
&lt;PRE&gt;&lt;SPAN&gt;FOR each table_name, df IN output_tables:&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;df.write.format("delta")&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;.mode("overwrite")&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;.saveAsTable("catalog.output_schema.{table_name}")&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;-- Also write a metadata catalog table&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;metadata_df = build_metadata(entities, output_tables)&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;metadata_df.write ... saveAsTable("catalog.output_schema._metadata_catalog")&lt;/SPAN&gt;&lt;/PRE&gt;
&lt;P&gt;I hope this covers what you were looking for. Good luck!&lt;/P&gt;
&lt;P class="p1"&gt;&lt;FONT size="2" color="#FF6600"&gt;&lt;STRONG&gt;&lt;I&gt;If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.&lt;/I&gt;&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 11 Mar 2026 22:14:52 GMT</pubDate>
      <guid>https://community.databricks.com/t5/databricks-free-edition-help/complex-json-file-flatten-dynamically/m-p/150633#M709</guid>
      <dc:creator>Ashwin_DSA</dc:creator>
      <dc:date>2026-03-11T22:14:52Z</dc:date>
    </item>
    <item>
      <title>Re: Complex Json file Flatten Dynamically</title>
      <link>https://community.databricks.com/t5/databricks-free-edition-help/complex-json-file-flatten-dynamically/m-p/150655#M710</link>
      <description>&lt;P&gt;Thanks for the quick response!&amp;nbsp;I'm going to dive in and start building the implementation now.&lt;/P&gt;</description>
      <pubDate>Thu, 12 Mar 2026 05:43:09 GMT</pubDate>
      <guid>https://community.databricks.com/t5/databricks-free-edition-help/complex-json-file-flatten-dynamically/m-p/150655#M710</guid>
      <dc:creator>SantiNath_Dey</dc:creator>
      <dc:date>2026-03-12T05:43:09Z</dc:date>
    </item>
  </channel>
</rss>

