<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Autoloader: struct field inferred as string in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/autoloader-struct-field-inferred-as-string/m-p/131681#M49189</link>
    <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/110502"&gt;@szymon_dybczak&lt;/a&gt;&amp;nbsp;Struct has its own benefits over variant. It's more memory efficient, and it shows the full nested schema in Overview in Unity Catalog, while the variant type just shows the name of the first level field. Basically you can navigate through it very easily.&lt;/P&gt;&lt;P&gt;And many other limitations described in&amp;nbsp;&lt;A href="https://docs.databricks.com/aws/en/delta/variant#limitations" target="_blank"&gt;https://docs.databricks.com/aws/en/delta/variant#limitations&lt;/A&gt;&amp;nbsp;.&lt;/P&gt;</description>
    <pubDate>Thu, 11 Sep 2025 17:13:47 GMT</pubDate>
    <dc:creator>yit</dc:creator>
    <dc:date>2025-09-11T17:13:47Z</dc:date>
    <item>
      <title>Autoloader: struct field inferred as string</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-struct-field-inferred-as-string/m-p/131656#M49178</link>
      <description>&lt;P&gt;We are currently implementing Autoloader for JSON files with nested struct fields. The goal is to detect the fields as structs, and to have schema evolution.&lt;/P&gt;&lt;P&gt;The schema evolution mode is set to addNewColumns, and inferColumnTypes option is set to true to detect the real types of the fields instead of making them all strings.&amp;nbsp;&lt;/P&gt;&lt;P&gt;One of the fields is deeply nested struct. There are some empty files in the folder as well. &lt;STRONG&gt;The problem is that Autoloader infers the field as string.&amp;nbsp;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;What could be the issue?&amp;nbsp;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;I cannot use schemaHints to define the field as struct, because not even one field exists in all files, while schemaHints expects to define at least one field for struct types.&lt;/P&gt;</description>
      <pubDate>Thu, 11 Sep 2025 13:40:17 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-struct-field-inferred-as-string/m-p/131656#M49178</guid>
      <dc:creator>yit</dc:creator>
      <dc:date>2025-09-11T13:40:17Z</dc:date>
    </item>
    <item>
      <title>Re: Autoloader: struct field inferred as string</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-struct-field-inferred-as-string/m-p/131663#M49181</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/175553"&gt;@yit&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;This is expected and documented behaviour of autloader schema inference:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="szymon_dybczak_0-1757598803515.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/19915i07AE6A2D9B23F430/image-size/medium?v=v2&amp;amp;px=400" role="button" title="szymon_dybczak_0-1757598803515.png" alt="szymon_dybczak_0-1757598803515.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.databricks.com/aws/en/ingestion/cloud-object-storage/auto-loader/schema#how-does-auto-loader-schema-inference-work" target="_blank"&gt;Configure schema inference and evolution in Auto Loader | Databricks on AWS&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 11 Sep 2025 13:53:42 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-struct-field-inferred-as-string/m-p/131663#M49181</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2025-09-11T13:53:42Z</dc:date>
    </item>
    <item>
      <title>Re: Autoloader: struct field inferred as string</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-struct-field-inferred-as-string/m-p/131676#M49185</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/110502"&gt;@szymon_dybczak&lt;/a&gt;&amp;nbsp;Not if you set the option inferColumnTypes to true.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="yit_0-1757606885202.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/19920i1185BC5FD33B07C7/image-size/medium?v=v2&amp;amp;px=400" role="button" title="yit_0-1757606885202.png" alt="yit_0-1757606885202.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 11 Sep 2025 16:08:19 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-struct-field-inferred-as-string/m-p/131676#M49185</guid>
      <dc:creator>yit</dc:creator>
      <dc:date>2025-09-11T16:08:19Z</dc:date>
    </item>
    <item>
      <title>Re: Autoloader: struct field inferred as string</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-struct-field-inferred-as-string/m-p/131677#M49186</link>
      <description>&lt;P&gt;Thanks, good to know about this option. But if &lt;SPAN&gt;inferColumnTypes&amp;nbsp;options doesn't work in this case, maybe you can try new VARIANT type instead?&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 11 Sep 2025 16:34:18 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-struct-field-inferred-as-string/m-p/131677#M49186</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2025-09-11T16:34:18Z</dc:date>
    </item>
    <item>
      <title>Re: Autoloader: struct field inferred as string</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-struct-field-inferred-as-string/m-p/131681#M49189</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/110502"&gt;@szymon_dybczak&lt;/a&gt;&amp;nbsp;Struct has its own benefits over variant. It's more memory efficient, and it shows the full nested schema in Overview in Unity Catalog, while the variant type just shows the name of the first level field. Basically you can navigate through it very easily.&lt;/P&gt;&lt;P&gt;And many other limitations described in&amp;nbsp;&lt;A href="https://docs.databricks.com/aws/en/delta/variant#limitations" target="_blank"&gt;https://docs.databricks.com/aws/en/delta/variant#limitations&lt;/A&gt;&amp;nbsp;.&lt;/P&gt;</description>
      <pubDate>Thu, 11 Sep 2025 17:13:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-struct-field-inferred-as-string/m-p/131681#M49189</guid>
      <dc:creator>yit</dc:creator>
      <dc:date>2025-09-11T17:13:47Z</dc:date>
    </item>
    <item>
      <title>Re: Autoloader: struct field inferred as string</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-struct-field-inferred-as-string/m-p/131684#M49191</link>
      <description>&lt;P&gt;Yep, it does have a limitation. But did you see some benchmark that compared memory efficiency between variant vs struct? I'm quite sure that in some marketing materials databricks claimed that variant should be much more efficient - especially working with JSON files. So, it's quite interesting. Maybe that's another example to conduct own benchmarks and to always double check marketing claims &lt;span class="lia-unicode-emoji" title=":grinning_face_with_smiling_eyes:"&gt;😄&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 11 Sep 2025 17:22:17 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-struct-field-inferred-as-string/m-p/131684#M49191</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2025-09-11T17:22:17Z</dc:date>
    </item>
    <item>
      <title>Re: Autoloader: struct field inferred as string</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-struct-field-inferred-as-string/m-p/132066#M49340</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/110502"&gt;@szymon_dybczak&lt;/a&gt;&amp;nbsp;I did not conduct memory efficiency benchmarking, as it's not the main evaluation we seek for. Considering the current scenario, we definitely want to use struct.&lt;BR /&gt;&lt;STRONG&gt;The problem that should be solved is how to define an empty struct in schema hints (it raises an error that at least one field in the struct should be defined)?&lt;/STRONG&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 16 Sep 2025 06:28:21 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-struct-field-inferred-as-string/m-p/132066#M49340</guid>
      <dc:creator>yit</dc:creator>
      <dc:date>2025-09-16T06:28:21Z</dc:date>
    </item>
    <item>
      <title>Re: Autoloader: struct field inferred as string</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-struct-field-inferred-as-string/m-p/132116#M49358</link>
      <description>&lt;P&gt;It is possible to define empty struct as column type through schema hints, but it won't do schema evolution if subfields appear in the data for that column.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Conclusion when working with JSON files and 'addNewColumns' as schema evolution mode:&amp;nbsp;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;You can give partial schema through schema hints.&lt;/LI&gt;&lt;LI&gt;If you give schema hint for some field which is of type struct, you must provide the full schema of the struct. Otherwise it will raise an error, as schema evolution does not apply for fields for which we have defined schema hints.&lt;/LI&gt;&lt;LI&gt;If a column has data from both struct and array types, it will infer it as string - the most generic type to represent both struct and arrays.&lt;/LI&gt;&lt;/UL&gt;</description>
      <pubDate>Tue, 16 Sep 2025 13:22:10 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-struct-field-inferred-as-string/m-p/132116#M49358</guid>
      <dc:creator>yit</dc:creator>
      <dc:date>2025-09-16T13:22:10Z</dc:date>
    </item>
  </channel>
</rss>

