<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Reading different file structures for json files in blob stores in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/reading-different-file-structures-for-json-files-in-blob-stores/m-p/114192#M44744</link>
    <description>&lt;P&gt;If they're all JSON but have different structure you can use the variant type&lt;/P&gt;
&lt;P&gt;&lt;A href="https://docs.databricks.com/aws/en/sql/language-manual/data-types/variant-type" target="_blank"&gt;https://docs.databricks.com/aws/en/sql/language-manual/data-types/variant-type&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;There's a few examples in this blog too:&amp;nbsp;&lt;A href="https://www.databricks.com/blog/introducing-open-variant-data-type-delta-lake-and-apache-spark" target="_blank"&gt;https://www.databricks.com/blog/introducing-open-variant-data-type-delta-lake-and-apache-spark&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 01 Apr 2025 15:25:57 GMT</pubDate>
    <dc:creator>holly</dc:creator>
    <dc:date>2025-04-01T15:25:57Z</dc:date>
    <item>
      <title>Reading different file structures for json files in blob stores</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-different-file-structures-for-json-files-in-blob-stores/m-p/114030#M44704</link>
      <description>&lt;P&gt;Hi All,&lt;/P&gt;&lt;P&gt;We are planning to store some mixed json files in blob store and read into Databricks. I am questioning whether we should have a container for each structure or if the various tools in Databricks can successfully read the different types. I have my doubts being there is no way to separate them as it's a flat file structure regardless of what we write the files to look like in the storage to us humans.&lt;/P&gt;&lt;P&gt;I can filter the files in a python script, but that prevents them from things like autoloader or am I missing something in how to use autoloader in this scenario.&lt;/P&gt;&lt;P&gt;How have others approached this?&lt;/P&gt;</description>
      <pubDate>Sun, 30 Mar 2025 22:17:21 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-different-file-structures-for-json-files-in-blob-stores/m-p/114030#M44704</guid>
      <dc:creator>turagittech</dc:creator>
      <dc:date>2025-03-30T22:17:21Z</dc:date>
    </item>
    <item>
      <title>Re: Reading different file structures for json files in blob stores</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-different-file-structures-for-json-files-in-blob-stores/m-p/114192#M44744</link>
      <description>&lt;P&gt;If they're all JSON but have different structure you can use the variant type&lt;/P&gt;
&lt;P&gt;&lt;A href="https://docs.databricks.com/aws/en/sql/language-manual/data-types/variant-type" target="_blank"&gt;https://docs.databricks.com/aws/en/sql/language-manual/data-types/variant-type&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;There's a few examples in this blog too:&amp;nbsp;&lt;A href="https://www.databricks.com/blog/introducing-open-variant-data-type-delta-lake-and-apache-spark" target="_blank"&gt;https://www.databricks.com/blog/introducing-open-variant-data-type-delta-lake-and-apache-spark&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 01 Apr 2025 15:25:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-different-file-structures-for-json-files-in-blob-stores/m-p/114192#M44744</guid>
      <dc:creator>holly</dc:creator>
      <dc:date>2025-04-01T15:25:57Z</dc:date>
    </item>
    <item>
      <title>Re: Reading different file structures for json files in blob stores</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-different-file-structures-for-json-files-in-blob-stores/m-p/114673#M44904</link>
      <description>&lt;P&gt;This doesn't hit the mark as I am referring to each json file representing a different table of data. I think multiple structures in a blob container confuse a lot of tools and that means you have to do file by file loading and that is going to be the least efficient approach.&lt;/P&gt;</description>
      <pubDate>Mon, 07 Apr 2025 05:02:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-different-file-structures-for-json-files-in-blob-stores/m-p/114673#M44904</guid>
      <dc:creator>turagittech</dc:creator>
      <dc:date>2025-04-07T05:02:48Z</dc:date>
    </item>
    <item>
      <title>Re: Reading different file structures for json files in blob stores</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-different-file-structures-for-json-files-in-blob-stores/m-p/117519#M45515</link>
      <description>&lt;P&gt;Variant should work in this scenario too. There's also been some performance improvements with variant so much more of the metadata has stats for efficient processing.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You also don't have to go file by file. You can use things like autoloader that will checkpoint all the reads, or if you want you can use "*" in a location to denote everything in that path.&lt;/P&gt;</description>
      <pubDate>Fri, 02 May 2025 13:36:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-different-file-structures-for-json-files-in-blob-stores/m-p/117519#M45515</guid>
      <dc:creator>holly</dc:creator>
      <dc:date>2025-05-02T13:36:08Z</dc:date>
    </item>
    <item>
      <title>Re: Reading different file structures for json files in blob stores</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-different-file-structures-for-json-files-in-blob-stores/m-p/119397#M45860</link>
      <description>&lt;P&gt;I'll look at this once we go to production with the source files. I have split logs by file type to simplify this, but I'll go back and look again for the test space with mixed files&lt;/P&gt;</description>
      <pubDate>Fri, 16 May 2025 00:39:01 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-different-file-structures-for-json-files-in-blob-stores/m-p/119397#M45860</guid>
      <dc:creator>turagittech</dc:creator>
      <dc:date>2025-05-16T00:39:01Z</dc:date>
    </item>
    <item>
      <title>Re: Reading different file structures for json files in blob stores</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-different-file-structures-for-json-files-in-blob-stores/m-p/119409#M45865</link>
      <description>&lt;P class=""&gt;&lt;STRONG&gt;Organize files by schema&lt;/STRONG&gt; into subfolders (e.g., /schema_type_a/, /schema_type_b/) in the same container.Avoid putting all JSON types in one folder&lt;/P&gt;</description>
      <pubDate>Fri, 16 May 2025 03:44:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-different-file-structures-for-json-files-in-blob-stores/m-p/119409#M45865</guid>
      <dc:creator>sandeepmankikar</dc:creator>
      <dc:date>2025-05-16T03:44:48Z</dc:date>
    </item>
  </channel>
</rss>

