<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Schema inference with auto loader (non-DLT and DLT) in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/schema-inference-with-auto-loader-non-dlt-and-dlt/m-p/53413#M29791</link>
    <description>&lt;P&gt;Hi.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Another question, this time about schema inference and column types.&amp;nbsp; I have dabbled with DLT and structured streaming with auto loader (as in, not DLT).&amp;nbsp; My data source use case is json files, which contain nested structures.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I noticed that in the resulting streaming DLT table, all columns were strings.&amp;nbsp; In the resulting delta table from the structured streaming + auto loader approach, the nested columns are structs.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Is this the option&amp;nbsp;cloudFiles.inferColumnTypes at work?&lt;/LI&gt;&lt;LI&gt;As I understand it from the doc, if I were to use&amp;nbsp;&lt;EM&gt;false&lt;/EM&gt; in the non-DLT structured streaming approach, the columns would all be strings, correct?&lt;/LI&gt;&lt;LI&gt;It doesn't look like I set anything for that option in the DLT declaration, so is &lt;EM&gt;false&lt;/EM&gt; the default for DLT?&amp;nbsp; Based on the doc I assume DLT using false is the case:&lt;/LI&gt;&lt;/UL&gt;&lt;LI-CODE lang="markup"&gt;cloudFiles.inferColumnTypes
Type: Boolean
Whether to infer exact column types when leveraging schema inference. By default, columns are inferred as strings when inferring JSON and CSV datasets. See schema inference for more details.
Default value: false&lt;/LI-CODE&gt;&lt;UL&gt;&lt;LI&gt;If I use infer&amp;nbsp;&lt;EM&gt;false&lt;/EM&gt;&amp;nbsp;in the structured streaming approach, would schema changes in those nested struct columns not cause failures due to schema evolution, because they're just strings instead?&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;BR /&gt;Cheers.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 21 Nov 2023 23:27:15 GMT</pubDate>
    <dc:creator>ilarsen</dc:creator>
    <dc:date>2023-11-21T23:27:15Z</dc:date>
    <item>
      <title>Schema inference with auto loader (non-DLT and DLT)</title>
      <link>https://community.databricks.com/t5/data-engineering/schema-inference-with-auto-loader-non-dlt-and-dlt/m-p/53413#M29791</link>
      <description>&lt;P&gt;Hi.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Another question, this time about schema inference and column types.&amp;nbsp; I have dabbled with DLT and structured streaming with auto loader (as in, not DLT).&amp;nbsp; My data source use case is json files, which contain nested structures.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I noticed that in the resulting streaming DLT table, all columns were strings.&amp;nbsp; In the resulting delta table from the structured streaming + auto loader approach, the nested columns are structs.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Is this the option&amp;nbsp;cloudFiles.inferColumnTypes at work?&lt;/LI&gt;&lt;LI&gt;As I understand it from the doc, if I were to use&amp;nbsp;&lt;EM&gt;false&lt;/EM&gt; in the non-DLT structured streaming approach, the columns would all be strings, correct?&lt;/LI&gt;&lt;LI&gt;It doesn't look like I set anything for that option in the DLT declaration, so is &lt;EM&gt;false&lt;/EM&gt; the default for DLT?&amp;nbsp; Based on the doc I assume DLT using false is the case:&lt;/LI&gt;&lt;/UL&gt;&lt;LI-CODE lang="markup"&gt;cloudFiles.inferColumnTypes
Type: Boolean
Whether to infer exact column types when leveraging schema inference. By default, columns are inferred as strings when inferring JSON and CSV datasets. See schema inference for more details.
Default value: false&lt;/LI-CODE&gt;&lt;UL&gt;&lt;LI&gt;If I use infer&amp;nbsp;&lt;EM&gt;false&lt;/EM&gt;&amp;nbsp;in the structured streaming approach, would schema changes in those nested struct columns not cause failures due to schema evolution, because they're just strings instead?&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;BR /&gt;Cheers.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 21 Nov 2023 23:27:15 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/schema-inference-with-auto-loader-non-dlt-and-dlt/m-p/53413#M29791</guid>
      <dc:creator>ilarsen</dc:creator>
      <dc:date>2023-11-21T23:27:15Z</dc:date>
    </item>
    <item>
      <title>Re: Schema inference with auto loader (non-DLT and DLT)</title>
      <link>https://community.databricks.com/t5/data-engineering/schema-inference-with-auto-loader-non-dlt-and-dlt/m-p/58289#M31085</link>
      <description>&lt;P&gt;A late thank you for your reply, Kaniz.&amp;nbsp; From my experience in the platform so far, I do like what schema inference does and I prefer to use it.&lt;/P&gt;</description>
      <pubDate>Tue, 23 Jan 2024 20:45:35 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/schema-inference-with-auto-loader-non-dlt-and-dlt/m-p/58289#M31085</guid>
      <dc:creator>ilarsen</dc:creator>
      <dc:date>2024-01-23T20:45:35Z</dc:date>
    </item>
  </channel>
</rss>

