<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Reading multi-dimensional json files in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/reading-multi-dimensional-json-files/m-p/18128#M11983</link>
    <description>&lt;P&gt;So I've been having some issues reading a json file that's been provided to the business with another nesting layer, so instead of a json being an:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;'array of objects' -&amp;gt; [ {} ,{} ,{} ] &lt;/LI&gt;&lt;LI&gt;It's an 'array of arrays of objects' -&amp;gt; [ [ {}, {} ,{} ], [ {} ,{} ,{} ] ]&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;While the first is alright to read with the multiline option with spark, the second case simply comes with the correct column schema, thought every columns is just a null value (actual file content looks good)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I've so far tried to create a custom struct schema to deal with the extra layer, but not had any luck to get it to work. Just returns nulls.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Is there something obvious that i'm missing?&lt;/P&gt;</description>
    <pubDate>Wed, 07 Dec 2022 13:31:55 GMT</pubDate>
    <dc:creator>AndriusVitkausk</dc:creator>
    <dc:date>2022-12-07T13:31:55Z</dc:date>
    <item>
      <title>Reading multi-dimensional json files</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-multi-dimensional-json-files/m-p/18128#M11983</link>
      <description>&lt;P&gt;So I've been having some issues reading a json file that's been provided to the business with another nesting layer, so instead of a json being an:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;'array of objects' -&amp;gt; [ {} ,{} ,{} ] &lt;/LI&gt;&lt;LI&gt;It's an 'array of arrays of objects' -&amp;gt; [ [ {}, {} ,{} ], [ {} ,{} ,{} ] ]&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;While the first is alright to read with the multiline option with spark, the second case simply comes with the correct column schema, thought every columns is just a null value (actual file content looks good)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I've so far tried to create a custom struct schema to deal with the extra layer, but not had any luck to get it to work. Just returns nulls.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Is there something obvious that i'm missing?&lt;/P&gt;</description>
      <pubDate>Wed, 07 Dec 2022 13:31:55 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-multi-dimensional-json-files/m-p/18128#M11983</guid>
      <dc:creator>AndriusVitkausk</dc:creator>
      <dc:date>2022-12-07T13:31:55Z</dc:date>
    </item>
    <item>
      <title>Re: Reading multi-dimensional json files</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-multi-dimensional-json-files/m-p/18129#M11984</link>
      <description>&lt;P&gt;You can use the explode function to flatten the array to rows, can you post a simple example of your data?&lt;/P&gt;</description>
      <pubDate>Mon, 30 Jan 2023 21:20:09 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-multi-dimensional-json-files/m-p/18129#M11984</guid>
      <dc:creator>ashish1</dc:creator>
      <dc:date>2023-01-30T21:20:09Z</dc:date>
    </item>
  </channel>
</rss>

