<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Using cloudFiles.inferColumnTypes with inferSchema and without defining schema checkpoint in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/using-cloudfiles-infercolumntypes-with-inferschema-and-without/m-p/153095#M53937</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/34815"&gt;@Louis_Frolio&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;After reading your reply I breathed a sigh of my relief.I spent hours just to make sure my experiment alings with Databrick's Documentation.&lt;/P&gt;&lt;P&gt;Thank you so muchh for your attension to this issue!!&lt;/P&gt;&lt;P&gt;Great job &lt;span class="lia-unicode-emoji" title=":grinning_face_with_smiling_eyes:"&gt;😄&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Just FYI,I am strugelling with one more issue(below is the link).If you can help me understand this then that would be helpful.&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.databricks.com/t5/data-engineering/autoloader-inserts-null-rows-in-delta-table-while-reading-json/m-p/153059#M53923" target="_blank"&gt;https://community.databricks.com/t5/data-engineering/autoloader-inserts-null-rows-in-delta-table-while-reading-json/m-p/153059#M53923&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Thu, 02 Apr 2026 19:17:13 GMT</pubDate>
    <dc:creator>mits1</dc:creator>
    <dc:date>2026-04-02T19:17:13Z</dc:date>
    <item>
      <title>Using cloudFiles.inferColumnTypes with inferSchema and without defining schema checkpoint</title>
      <link>https://community.databricks.com/t5/data-engineering/using-cloudfiles-infercolumntypes-with-inferschema-and-without/m-p/116779#M45372</link>
      <description>&lt;P&gt;&lt;U&gt;Two Issues:&lt;/U&gt;&lt;/P&gt;&lt;P&gt;1. What is the behavior of&amp;nbsp;cloudFiles.inferColumnTypes with and without cloudFiles.inferSchema? Why would you use both?&lt;/P&gt;&lt;P&gt;2. When can cloudFiles.inferColumnTypes be used without a schema checkpoint?&amp;nbsp; How does that affect the behavior of&amp;nbsp;cloudFiles.inferColumnTypes?&lt;/P&gt;&lt;P&gt;&lt;U&gt;Discussion:&lt;/U&gt;&lt;/P&gt;&lt;P&gt;1. I see example notebooks from databricks that use inferColumnTypes both WITH inferSchema:&amp;nbsp;&lt;A href="https://github.com/databricks/delta-live-tables-notebooks/blob/main/dms-dlt-cdc-demo/resources/dlt/dms-mysql-cdc-demo.py" target="_blank"&gt;delta-live-tables-notebooks/dms-dlt-cdc-demo/resources/dlt/dms-mysql-cdc-demo.py at main · databricks/delta-live-tables-notebooks · GitHub&lt;/A&gt;&amp;nbsp; &amp;nbsp; and WITHOUT inferSchema:&amp;nbsp;&lt;A href="https://github.com/databricks/delta-live-tables-notebooks/blob/main/dms-dlt-cdc-demo/resources/dlt/dms-mysql-cdc-demo.py" target="_blank"&gt;delta-live-tables-notebooks/dms-dlt-cdc-demo/resources/dlt/dms-mysql-cdc-demo.py at main · databricks/delta-live-tables-notebooks · GitHub&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/A&gt;&lt;/P&gt;&lt;P&gt;What is the use case for using both or only one of them? I would think that using both together is redundant and just creates unnecessary compute overhead. Except I find that's not necessarily true from my explorations on the behavior of these options.&lt;/P&gt;&lt;P&gt;2. Schema checkpoints: are they necessary or not?&lt;/P&gt;&lt;P&gt;All the documentation I find on cloudFiles.inferColumnTypes says that when using it, you must also define a schema checkpoint:&amp;nbsp;&lt;A href="https://learn.microsoft.com/en-us/azure/databricks/ingestion/cloud-object-storage/auto-loader/schema" target="_blank"&gt;Configure schema inference and evolution in Auto Loader - Azure Databricks | Microsoft Learn&lt;/A&gt;&lt;/P&gt;&lt;P&gt;However, I see some example notebooks from databricks that depict using&amp;nbsp;cloudFiles.inferColumnTypes = True without ever defining a schema checkpoint:&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;-&amp;nbsp;&lt;A href="https://github.com/databricks/delta-live-tables-notebooks/blob/main/dms-dlt-cdc-demo/resources/dlt/dms-mysql-cdc-demo.py" target="_blank"&gt;delta-live-tables-notebooks/dms-dlt-cdc-demo/resources/dlt/dms-mysql-cdc-demo.py at main · databricks/delta-live-tables-notebooks · GitHub&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://github.com/databricks/delta-live-tables-notebooks/blob/main/change-data-capture-example/notebooks/2-Retail_DLT_CDC_Python.py" target="_blank"&gt;- delta-live-tables-notebooks/change-data-capture-example/notebooks/2-Retail_DLT_CDC_Python.py at main · databricks/delta-live-tables-notebooks · GitHub&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 28 Apr 2025 13:56:25 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/using-cloudfiles-infercolumntypes-with-inferschema-and-without/m-p/116779#M45372</guid>
      <dc:creator>BF7</dc:creator>
      <dc:date>2025-04-28T13:56:25Z</dc:date>
    </item>
    <item>
      <title>Re: Using cloudFiles.inferColumnTypes with inferSchema and without defining schema checkpoint</title>
      <link>https://community.databricks.com/t5/data-engineering/using-cloudfiles-infercolumntypes-with-inferschema-and-without/m-p/116829#M45379</link>
      <description>&lt;OL start="1"&gt;
&lt;LI&gt;
&lt;DIV class="paragraph"&gt;&lt;STRONG&gt;Behavior of &lt;CODE&gt;cloudFiles.inferColumnTypes&lt;/CODE&gt; with and without &lt;CODE&gt;cloudFiles.inferSchema&lt;/CODE&gt;:&lt;/STRONG&gt;&lt;BR /&gt;When &lt;CODE&gt;cloudFiles.inferColumnTypes&lt;/CODE&gt; is enabled, Auto Loader attempts to identify the appropriate data types for columns instead of defaulting everything to strings, which is the default behavior for file formats like JSON, CSV, and XML.&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;Without enabling &lt;CODE&gt;cloudFiles.inferSchema&lt;/CODE&gt;, Auto Loader does not perform automatic schema inference. Instead, users must provide a schema explicitly or use schema hints. When both &lt;CODE&gt;cloudFiles.inferColumnTypes&lt;/CODE&gt; and &lt;CODE&gt;cloudFiles.inferSchema&lt;/CODE&gt; are enabled together, Auto Loader performs full schema inference on the incoming data, determining appropriate column data types based on the sampled data. This is especially useful for file formats lacking inherent type encoding (e.g., CSV, JSON).&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;&lt;STRONG&gt;Why use both:&lt;/STRONG&gt; The combination is beneficial when you want Auto Loader to infer both the schema structure (new columns, changes) and column data types dynamically, reducing manual intervention in managing schema during ingestion.&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI&gt;&amp;nbsp;&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV class="paragraph"&gt;&lt;STRONG&gt;Using &lt;CODE&gt;cloudFiles.inferColumnTypes&lt;/CODE&gt; without a schema checkpoint and its behavior:&lt;/STRONG&gt;&lt;BR /&gt;The &lt;CODE&gt;cloudFiles.inferColumnTypes&lt;/CODE&gt; option can technically be enabled without specifying a schema checkpoint (&lt;CODE&gt;cloudFiles.schemaLocation&lt;/CODE&gt;), but this setup is not recommended. Without a schema checkpoint, inferred schema changes cannot be tracked or persisted across runs, leading to potential issues when new data arrives with schema alterations.&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;The schema checkpoint enables Auto Loader to persist schema evolution information and manage additions like new columns or changes in the data structure across micro-batches. Without a schema checkpoint, the behavior of &lt;CODE&gt;cloudFiles.inferColumnTypes&lt;/CODE&gt; is limited to inferring column types for the current batch or sample scope, and schema consistency is the user’s responsibility.&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;Using both &lt;CODE&gt;cloudFiles.inferColumnTypes&lt;/CODE&gt; and a schema checkpoint allows seamless management of schema evolution while ensuring column types are accurately inferred and tracked. Missing checkpoint information may result in redundant inference and susceptibility to runtime errors if data evolves unexpectedly.&lt;/DIV&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Hope this helps. BigRoux.&lt;/P&gt;</description>
      <pubDate>Mon, 28 Apr 2025 17:57:34 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/using-cloudfiles-infercolumntypes-with-inferschema-and-without/m-p/116829#M45379</guid>
      <dc:creator>Louis_Frolio</dc:creator>
      <dc:date>2025-04-28T17:57:34Z</dc:date>
    </item>
    <item>
      <title>Re: Using cloudFiles.inferColumnTypes with inferSchema and without defining schema checkpoint</title>
      <link>https://community.databricks.com/t5/data-engineering/using-cloudfiles-infercolumntypes-with-inferschema-and-without/m-p/116834#M45382</link>
      <description>&lt;P&gt;Yes! This is exactly what I needed! Thank you so much!&lt;/P&gt;</description>
      <pubDate>Mon, 28 Apr 2025 19:01:34 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/using-cloudfiles-infercolumntypes-with-inferschema-and-without/m-p/116834#M45382</guid>
      <dc:creator>BF7</dc:creator>
      <dc:date>2025-04-28T19:01:34Z</dc:date>
    </item>
    <item>
      <title>Re: Using cloudFiles.inferColumnTypes with inferSchema and without defining schema checkpoint</title>
      <link>https://community.databricks.com/t5/data-engineering/using-cloudfiles-infercolumntypes-with-inferschema-and-without/m-p/152948#M53902</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/34815"&gt;@Louis_Frolio&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;I explored something related and intresteing (or confusing).&lt;/P&gt;&lt;P&gt;This conflicts with Databrick's documentation statement as follows&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000080"&gt;&lt;SPAN&gt;"By default, Auto Loader schema inference seeks to avoid schema evolution issues due to type mismatches. For formats that don't encode data types (JSON, CSV, and XML), Auto Loader infers all columns as strings (including nested fields in JSON files)."&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;What I experinced recenlty is that &lt;/SPAN&gt;&lt;SPAN&gt;Autoloader DOES inefer schema for json file IN the SCHEMA FILE it creates at the schema location.Also,it infers all columns as string IN THE DATAFRAME ONLY.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Below is my observation.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;Input Data :&lt;/P&gt;&lt;P&gt;{"Name":"Alfred","geneder":"M","Age":14}&lt;BR /&gt;{"Name":"John","geneder":"M","Age":12}&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Scenario 1 : Without&amp;nbsp;&lt;SPAN&gt;cloudFiles.inferColumnTypes&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;df &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; spark.readStream.\&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;format&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"cloudFiles"&lt;/SPAN&gt;&lt;SPAN&gt;)\&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;option&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"cloudFiles.format"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"json"&lt;/SPAN&gt;&lt;SPAN&gt;)\&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;option&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"cloudFiles.schemaLocation"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"/Volumes/workspace/default/sys/schema4"&lt;/SPAN&gt;&lt;SPAN&gt;)\&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;load&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;'/Volumes/workspace/dev/input/'&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&lt;U&gt;DF schema&lt;/U&gt; -&amp;gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV class=""&gt;&lt;A target="_blank"&gt;&lt;SPAN class=""&gt;df:&lt;/SPAN&gt;&lt;SPAN class=""&gt;pyspark.sql.connect.dataframe.DataFrame&lt;/SPAN&gt;&lt;/A&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;DIV&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV&gt;&lt;DIV class=""&gt;&lt;DIV&gt;&lt;DIV class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;Age:&lt;/SPAN&gt;string&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;DIV&gt;&lt;DIV class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;Name:&lt;/SPAN&gt;string&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;DIV&gt;&lt;DIV class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;geneder:&lt;/SPAN&gt;string&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;DIV&gt;&lt;DIV class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;_rescued_data:&lt;/SPAN&gt;string&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;SPAN class=""&gt;&lt;U&gt;Autoloader schema file contents&lt;/U&gt; -&amp;gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;SPAN class=""&gt;v1&lt;BR /&gt;{"dataSchemaJson":"{\"type\":\"struct\",\"fields\":[{\"name\":\"&lt;STRONG&gt;Age\",\"type\":\"long&lt;/STRONG&gt;\",\"nullable\":true,\"metadata\":{}},{\"name\":\"Name\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"geneder\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}}]}","partitionSchemaJson":"{\"type\":\"struct\",\"fields\":[]}"}&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;FONT color="#FF0000"&gt;&lt;SPAN class=""&gt;Note :- Looks like autoloader still infers the data but not the dataframe.&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&amp;nbsp;---------------------------------------------------------------------------------------------------------------------------------------------------------&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;SPAN class=""&gt;&lt;STRONG&gt;Scenario 2 :&lt;/STRONG&gt;&amp;nbsp;&lt;STRONG&gt;With&amp;nbsp;&lt;SPAN&gt;cloudFiles.inferColumnTypes&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;SPAN class=""&gt;Input data :&amp;nbsp;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;SPAN class=""&gt;{"Name":"Mits","geneder":"F","Age":35}&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;df &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; spark.readStream.\&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;format&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"cloudFiles"&lt;/SPAN&gt;&lt;SPAN&gt;)\&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;option&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"cloudFiles.format"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"json"&lt;/SPAN&gt;&lt;SPAN&gt;)\&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;option&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"cloudFiles.inferColumnTypes"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"true"&lt;/SPAN&gt;&lt;SPAN&gt;)\&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;option&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"cloudFiles.schemaLocation"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"/Volumes/workspace/default/sys/schema4"&lt;/SPAN&gt;&lt;SPAN&gt;)\&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;load&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;'/Volumes/workspace/dev/input/'&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;U&gt;DF Schema&lt;/U&gt; -&amp;gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV class=""&gt;&lt;A target="_blank"&gt;&lt;SPAN class=""&gt;df:&lt;/SPAN&gt;&lt;SPAN class=""&gt;pyspark.sql.connect.dataframe.DataFrame&lt;/SPAN&gt;&lt;/A&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV&gt;&lt;DIV class=""&gt;&lt;DIV&gt;&lt;DIV class=""&gt;&lt;STRONG&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;Age:&lt;/SPAN&gt;long&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;DIV&gt;&lt;DIV class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;Name:&lt;/SPAN&gt;string&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;DIV&gt;&lt;DIV class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;geneder:&lt;/SPAN&gt;string&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;DIV&gt;&lt;DIV class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;_rescued_data:&lt;/SPAN&gt;string&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;SPAN class=""&gt;&lt;U&gt;Autoloader schema file contents&lt;/U&gt; -&amp;gt; No change!!&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;SPAN class=""&gt;v1&lt;BR /&gt;{"dataSchemaJson":"{\"type\":\"struct\",\"fields\":[{\"name\":\"Age\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}},{\"name\":\"Name\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"geneder\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}}]}","partitionSchemaJson":"{\"type\":\"struct\",\"fields\":[]}"}&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;FONT color="#FF0000"&gt;&lt;SPAN class=""&gt;Note : Dataframe's schema changes but not the Autoloader's (obviosuly,because there is no change in the source data).&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;FONT color="#000000"&gt;&lt;SPAN class=""&gt;It would be very helpful for me to understand behaviour of schema inference,if you could clarify this.&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;FONT color="#000000"&gt;&lt;SPAN class=""&gt;Thanks in advance!!&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Wed, 01 Apr 2026 18:21:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/using-cloudfiles-infercolumntypes-with-inferschema-and-without/m-p/152948#M53902</guid>
      <dc:creator>mits1</dc:creator>
      <dc:date>2026-04-01T18:21:23Z</dc:date>
    </item>
    <item>
      <title>Re: Using cloudFiles.inferColumnTypes with inferSchema and without defining schema checkpoint</title>
      <link>https://community.databricks.com/t5/data-engineering/using-cloudfiles-infercolumntypes-with-inferschema-and-without/m-p/153092#M53935</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/198217"&gt;@mits1&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="p1"&gt;Great observation. I can see why this feels like it contradicts the docs. It doesn’t, but I agree the documentation could be clearer about what’s happening under the hood. Let me walk through it.&lt;/P&gt;
&lt;P class="p1"&gt;Auto Loader’s schema inference actually operates in two layers, and that’s the key to what you’re seeing.&lt;/P&gt;
&lt;P class="p1"&gt;First, the schema file (stored in _schemas at your schemaLocation) always captures the actual detected types from sampling your data. That’s why Age shows up as a long there. Auto Loader needs those real types to track schema evolution over time, detect type changes, and decide what should land in _rescued_data when something doesn’t match.&lt;/P&gt;
&lt;P class="p1"&gt;Second, the DataFrame schema is where cloudFiles.inferColumnTypes comes into play. When that option is false, which is the default for JSON, CSV, and XML, Auto Loader takes the inferred schema and casts everything to strings before exposing it in the DataFrame. That’s the “safe default” the docs are referring to. When you flip it to true, the DataFrame reflects the actual detected types from the schema file instead of flattening everything to strings.&lt;/P&gt;
&lt;P class="p1"&gt;So when the docs say “all columns are inferred as strings,” they’re really talking about the DataFrame output, not the schema file itself.&lt;/P&gt;
&lt;P class="p1"&gt;You can see this clearly in your scenarios. In Scenario 1, the schema file correctly records Age as a long, but the DataFrame shows it as a string. In Scenario 2, once you enable inferColumnTypes, the DataFrame starts reflecting the real types. The schema file doesn’t change because it already had the correct types from the start.&lt;/P&gt;
&lt;P class="p1"&gt;Here’s the clean way to think about it:&lt;/P&gt;
&lt;P class="p1"&gt;Schema file&lt;/P&gt;
&lt;P class="p1"&gt;Always stores the true detected types, regardless of inferColumnTypes. This is by design for schema evolution.&lt;/P&gt;
&lt;P class="p1"&gt;DataFrame&lt;/P&gt;
&lt;P class="p1"&gt;Controlled by inferColumnTypes&lt;/P&gt;
&lt;P class="p1"&gt;False means everything is presented as strings&lt;/P&gt;
&lt;P class="p1"&gt;True means you get the actual detected types&lt;/P&gt;
&lt;P class="p1"&gt;Your instinct was spot on. Auto Loader is still inferring the data types, it just doesn’t always surface them in the DataFrame unless you tell it to.&lt;/P&gt;
&lt;P class="p1"&gt;Hope this helps, Lou.&lt;/P&gt;</description>
      <pubDate>Thu, 02 Apr 2026 18:41:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/using-cloudfiles-infercolumntypes-with-inferschema-and-without/m-p/153092#M53935</guid>
      <dc:creator>Louis_Frolio</dc:creator>
      <dc:date>2026-04-02T18:41:57Z</dc:date>
    </item>
    <item>
      <title>Re: Using cloudFiles.inferColumnTypes with inferSchema and without defining schema checkpoint</title>
      <link>https://community.databricks.com/t5/data-engineering/using-cloudfiles-infercolumntypes-with-inferschema-and-without/m-p/153095#M53937</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/34815"&gt;@Louis_Frolio&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;After reading your reply I breathed a sigh of my relief.I spent hours just to make sure my experiment alings with Databrick's Documentation.&lt;/P&gt;&lt;P&gt;Thank you so muchh for your attension to this issue!!&lt;/P&gt;&lt;P&gt;Great job &lt;span class="lia-unicode-emoji" title=":grinning_face_with_smiling_eyes:"&gt;😄&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Just FYI,I am strugelling with one more issue(below is the link).If you can help me understand this then that would be helpful.&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.databricks.com/t5/data-engineering/autoloader-inserts-null-rows-in-delta-table-while-reading-json/m-p/153059#M53923" target="_blank"&gt;https://community.databricks.com/t5/data-engineering/autoloader-inserts-null-rows-in-delta-table-while-reading-json/m-p/153059#M53923&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 02 Apr 2026 19:17:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/using-cloudfiles-infercolumntypes-with-inferschema-and-without/m-p/153095#M53937</guid>
      <dc:creator>mits1</dc:creator>
      <dc:date>2026-04-02T19:17:13Z</dc:date>
    </item>
    <item>
      <title>Re: Using cloudFiles.inferColumnTypes with inferSchema and without defining schema checkpoint</title>
      <link>https://community.databricks.com/t5/data-engineering/using-cloudfiles-infercolumntypes-with-inferschema-and-without/m-p/153096#M53938</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/198217"&gt;@mits1&lt;/a&gt;&amp;nbsp;, if you are happy with the answer please click on "Accept as Solution." It will give confidence to others.&amp;nbsp; Cheers, Lou.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 02 Apr 2026 19:23:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/using-cloudfiles-infercolumntypes-with-inferschema-and-without/m-p/153096#M53938</guid>
      <dc:creator>Louis_Frolio</dc:creator>
      <dc:date>2026-04-02T19:23:02Z</dc:date>
    </item>
    <item>
      <title>Re: Using cloudFiles.inferColumnTypes with inferSchema and without defining schema checkpoint</title>
      <link>https://community.databricks.com/t5/data-engineering/using-cloudfiles-infercolumntypes-with-inferschema-and-without/m-p/153098#M53940</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/34815"&gt;@Louis_Frolio&lt;/a&gt;&amp;nbsp;of course.where can I find this&amp;nbsp;&lt;SPAN&gt;"Accept as Solution." option?&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 02 Apr 2026 19:40:28 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/using-cloudfiles-infercolumntypes-with-inferschema-and-without/m-p/153098#M53940</guid>
      <dc:creator>mits1</dc:creator>
      <dc:date>2026-04-02T19:40:28Z</dc:date>
    </item>
    <item>
      <title>Re: Using cloudFiles.inferColumnTypes with inferSchema and without defining schema checkpoint</title>
      <link>https://community.databricks.com/t5/data-engineering/using-cloudfiles-infercolumntypes-with-inferschema-and-without/m-p/153226#M53948</link>
      <description>&lt;P&gt;It appears that someone already accpeted it as a solution. No further action needed.&amp;nbsp; Cheers, Louis.&lt;/P&gt;</description>
      <pubDate>Fri, 03 Apr 2026 18:19:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/using-cloudfiles-infercolumntypes-with-inferschema-and-without/m-p/153226#M53948</guid>
      <dc:creator>Louis_Frolio</dc:creator>
      <dc:date>2026-04-03T18:19:59Z</dc:date>
    </item>
  </channel>
</rss>

