<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic AutoLoader Type Widening in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/autoloader-type-widening/m-p/143069#M52088</link>
    <description>&lt;P&gt;&lt;SPAN&gt;I have a few doubts regarding AutoLoader behavior and capabilities. Please check and correct wherever my assumptions or understanding are incorrect, much appreciated. Below is my specific code&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;Example scenario:&lt;BR /&gt;Target Managed Delta Table (Type Widening enabled) has a field -&amp;nbsp; 'Quantity' , Type - int.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;My code:&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;df &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; (&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; spark.readStream&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;format&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"cloudFiles"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;option&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"cloudFiles.format"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"csv"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;option&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"cloudFiles.schemaLocation"&lt;/SPAN&gt;&lt;SPAN&gt;, schema_location)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;# Schema Inference&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;option&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"cloudFiles.inferColumnTypes"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"true"&lt;/SPAN&gt;&lt;SPAN&gt;)&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;option&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"cloudFiles.schemaEvolutionMode"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"addNewColumns"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;option&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"cloudFiles.rescuedDataColumn"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"_rescued_data"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;option&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"cloudFiles.allowOverwrites"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"true"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;option&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"header"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"true"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;load&lt;/SPAN&gt;&lt;SPAN&gt;(source_dir)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;withColumn&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"_rescued_data"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;col&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"_rescued_data"&lt;/SPAN&gt;&lt;SPAN&gt;).&lt;/SPAN&gt;&lt;SPAN&gt;cast&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"string"&lt;/SPAN&gt;&lt;SPAN&gt;))&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;)&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;def&lt;/SPAN&gt; &lt;SPAN&gt;upsert_batch&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;microBatchDF&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;batchId&lt;/SPAN&gt;&lt;SPAN&gt;: &lt;/SPAN&gt;&lt;SPAN&gt;int&lt;/SPAN&gt;&lt;SPAN&gt;&lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;# If target table doesn't exist yet, bootstrap it from inferred schema&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;if&lt;/SPAN&gt; &lt;SPAN&gt;not&lt;/SPAN&gt; &lt;SPAN&gt;table_exists_uc&lt;/SPAN&gt;&lt;SPAN&gt;(target_table):&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; (microBatchDF.write&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;format&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"delta"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;mode&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"overwrite"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;saveAsTable&lt;/SPAN&gt;&lt;SPAN&gt;(target_table)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; )&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;return&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; delta_t &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; DeltaTable.&lt;/SPAN&gt;&lt;SPAN&gt;forName&lt;/SPAN&gt;&lt;SPAN&gt;(spark, target_table)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; (delta_t.&lt;/SPAN&gt;&lt;SPAN&gt;alias&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"t"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;merge&lt;/SPAN&gt;&lt;SPAN&gt;(microBatchDF.&lt;/SPAN&gt;&lt;SPAN&gt;alias&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"s"&lt;/SPAN&gt;&lt;SPAN&gt;), &lt;/SPAN&gt;&lt;SPAN&gt;"t.order_id = s.order_id"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;withSchemaEvolution&lt;/SPAN&gt;&lt;SPAN&gt;()&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;whenMatchedUpdateAll&lt;/SPAN&gt;&lt;SPAN&gt;()&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;whenNotMatchedInsertAll&lt;/SPAN&gt;&lt;SPAN&gt;()&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;execute&lt;/SPAN&gt;&lt;SPAN&gt;()&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; )&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;query &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; (&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; df.writeStream&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;foreachBatch&lt;/SPAN&gt;&lt;SPAN&gt;(upsert_batch)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;option&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"checkpointLocation"&lt;/SPAN&gt;&lt;SPAN&gt;, checkpoint_location)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;trigger&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;availableNow&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;True&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;start&lt;/SPAN&gt;&lt;SPAN&gt;()&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&lt;SPAN&gt;Let's say on our first run we ingest a file which has all rows with quantity field as int, so my assumption our stored schema 0 should have quantity inferred as an int.&lt;BR /&gt;On our next run we load a file which has float values in the quantity column. So tell me, in this case will our target delta table automatically widen to accomodate this new file data which is a float? Before our writestream happens in the readstream portion itself my understanding is that since the latest schema it has mentions quantity with type int, the readstream will try to parse the float values against the int type and will not be able to do so these will end in rescued data and hence the target table quantity field will never see these float values. Please confirm this behavior.&lt;BR /&gt;Also as an additional question, does Autoloader ever parse floats to int (Ex : 4.5 -&amp;gt; 4 OR 2.0 -&amp;gt; 2) and if not, why?&lt;BR /&gt;&lt;BR /&gt;Thanks,&lt;BR /&gt;Parth&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 06 Jan 2026 05:57:23 GMT</pubDate>
    <dc:creator>parth_db</dc:creator>
    <dc:date>2026-01-06T05:57:23Z</dc:date>
    <item>
      <title>AutoLoader Type Widening</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-type-widening/m-p/143069#M52088</link>
      <description>&lt;P&gt;&lt;SPAN&gt;I have a few doubts regarding AutoLoader behavior and capabilities. Please check and correct wherever my assumptions or understanding are incorrect, much appreciated. Below is my specific code&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;Example scenario:&lt;BR /&gt;Target Managed Delta Table (Type Widening enabled) has a field -&amp;nbsp; 'Quantity' , Type - int.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;My code:&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;df &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; (&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; spark.readStream&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;format&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"cloudFiles"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;option&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"cloudFiles.format"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"csv"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;option&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"cloudFiles.schemaLocation"&lt;/SPAN&gt;&lt;SPAN&gt;, schema_location)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;# Schema Inference&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;option&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"cloudFiles.inferColumnTypes"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"true"&lt;/SPAN&gt;&lt;SPAN&gt;)&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;option&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"cloudFiles.schemaEvolutionMode"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"addNewColumns"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;option&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"cloudFiles.rescuedDataColumn"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"_rescued_data"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;option&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"cloudFiles.allowOverwrites"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"true"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;option&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"header"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"true"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;load&lt;/SPAN&gt;&lt;SPAN&gt;(source_dir)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;withColumn&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"_rescued_data"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;col&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"_rescued_data"&lt;/SPAN&gt;&lt;SPAN&gt;).&lt;/SPAN&gt;&lt;SPAN&gt;cast&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"string"&lt;/SPAN&gt;&lt;SPAN&gt;))&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;)&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;def&lt;/SPAN&gt; &lt;SPAN&gt;upsert_batch&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;microBatchDF&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;batchId&lt;/SPAN&gt;&lt;SPAN&gt;: &lt;/SPAN&gt;&lt;SPAN&gt;int&lt;/SPAN&gt;&lt;SPAN&gt;&lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;# If target table doesn't exist yet, bootstrap it from inferred schema&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;if&lt;/SPAN&gt; &lt;SPAN&gt;not&lt;/SPAN&gt; &lt;SPAN&gt;table_exists_uc&lt;/SPAN&gt;&lt;SPAN&gt;(target_table):&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; (microBatchDF.write&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;format&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"delta"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;mode&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"overwrite"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;saveAsTable&lt;/SPAN&gt;&lt;SPAN&gt;(target_table)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; )&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;return&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; delta_t &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; DeltaTable.&lt;/SPAN&gt;&lt;SPAN&gt;forName&lt;/SPAN&gt;&lt;SPAN&gt;(spark, target_table)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; (delta_t.&lt;/SPAN&gt;&lt;SPAN&gt;alias&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"t"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;merge&lt;/SPAN&gt;&lt;SPAN&gt;(microBatchDF.&lt;/SPAN&gt;&lt;SPAN&gt;alias&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"s"&lt;/SPAN&gt;&lt;SPAN&gt;), &lt;/SPAN&gt;&lt;SPAN&gt;"t.order_id = s.order_id"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;withSchemaEvolution&lt;/SPAN&gt;&lt;SPAN&gt;()&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;whenMatchedUpdateAll&lt;/SPAN&gt;&lt;SPAN&gt;()&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;whenNotMatchedInsertAll&lt;/SPAN&gt;&lt;SPAN&gt;()&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;execute&lt;/SPAN&gt;&lt;SPAN&gt;()&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; )&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;query &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; (&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; df.writeStream&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;foreachBatch&lt;/SPAN&gt;&lt;SPAN&gt;(upsert_batch)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;option&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"checkpointLocation"&lt;/SPAN&gt;&lt;SPAN&gt;, checkpoint_location)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;trigger&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;availableNow&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;True&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;start&lt;/SPAN&gt;&lt;SPAN&gt;()&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&lt;SPAN&gt;Let's say on our first run we ingest a file which has all rows with quantity field as int, so my assumption our stored schema 0 should have quantity inferred as an int.&lt;BR /&gt;On our next run we load a file which has float values in the quantity column. So tell me, in this case will our target delta table automatically widen to accomodate this new file data which is a float? Before our writestream happens in the readstream portion itself my understanding is that since the latest schema it has mentions quantity with type int, the readstream will try to parse the float values against the int type and will not be able to do so these will end in rescued data and hence the target table quantity field will never see these float values. Please confirm this behavior.&lt;BR /&gt;Also as an additional question, does Autoloader ever parse floats to int (Ex : 4.5 -&amp;gt; 4 OR 2.0 -&amp;gt; 2) and if not, why?&lt;BR /&gt;&lt;BR /&gt;Thanks,&lt;BR /&gt;Parth&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 06 Jan 2026 05:57:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-type-widening/m-p/143069#M52088</guid>
      <dc:creator>parth_db</dc:creator>
      <dc:date>2026-01-06T05:57:23Z</dc:date>
    </item>
    <item>
      <title>Re: AutoLoader Type Widening</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-type-widening/m-p/143119#M52099</link>
      <description>&lt;P&gt;&lt;!--  StartFragment   --&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN class=""&gt;Auto Loader does not change column types on its own. It can add new columns automatically, but won’t widen types of existing columns.&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN class=""&gt;With your current settings and an initially inferred Quantity INT, if the next CSV has float values, Auto Loader will try to parse them as INT. Those values won’t load as floats; they’ll either become NULL and/or be captured in _&lt;/SPAN&gt;&lt;SPAN class=""&gt;rescued_data&lt;/SPAN&gt;&lt;SPAN class=""&gt; (depending on the format and parsing rules). They won’t reach your target Quantity as floats. &lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN class=""&gt;Type widening is a Delta Lake table feature (DR 15.4+), not an Auto Loader feature. It can widen INT &lt;/SPAN&gt;&lt;SPAN class=""&gt;→ DOUBLE/DECIMAL/LONG, but only during writes and when enabled on the target table. You still need the read side to produce data of the widened type (e.g., cast to DOUBLE) before MERGE. &lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN class=""&gt;Auto Loader won’t “round” floats to &lt;/SPAN&gt;&lt;SPAN class=""&gt;ints&lt;/SPAN&gt;&lt;SPAN class=""&gt; for you. It doesn’t silently coerce 4.5 → 4 or 2.0 → 2. If you explicitly cast in Spark, standard casting rules apply (fraction truncated), but Auto Loader itself avoids implicit lossy conversions for data-quality reasons.&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;!--  EndFragment   --&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 06 Jan 2026 19:05:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-type-widening/m-p/143119#M52099</guid>
      <dc:creator>nayan_wylde</dc:creator>
      <dc:date>2026-01-06T19:05:58Z</dc:date>
    </item>
    <item>
      <title>Re: AutoLoader Type Widening</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-type-widening/m-p/143127#M52102</link>
      <description>&lt;P&gt;Thanks for the response. As a final verification, my understanding is that Autoloader schema inference and schema evolution only tracks when colums are added in our schema, which then get get stored as the new schema version. In all the rest cases - column drops, type mismatches (such as float to an int col) the schema remains the same and will get rescued or ignored as per our rescue column settings. Therefore even if our target delta table has type widening enabled - since our schema doesn’t widen itself our target delta table will not get widened via autoloader unless we manually update our schema column data type to allow accurate parsing.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;1) Could you tell me if there is any configuration/option/parameter/setting to override this behavior and force schema data type widening to columns?&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;2) Is a same behavior seen even if we are working with Delta Live Tables?&lt;/P&gt;&lt;P&gt;3) And as sort of a inspiration is there any approach you would suggest to handle such schema changes when we are working with upstream data (100s of columns) that keeps on changing so providing an explicit schema or schema hints is counterproductive, and our initial inference may result in too narrow types causing mismatches in the future.&lt;/P&gt;&lt;P&gt;Apologies for the long questions, I am trying to understand Autoloader functions and capabilities to handle such edge cases when upstream is not stable and clean.&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Parth&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 06 Jan 2026 21:29:44 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-type-widening/m-p/143127#M52102</guid>
      <dc:creator>parth_db</dc:creator>
      <dc:date>2026-01-06T21:29:44Z</dc:date>
    </item>
    <item>
      <title>Re: AutoLoader Type Widening</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-type-widening/m-p/143215#M52120</link>
      <description>&lt;P&gt;1. No, Auto Loader does not provide an option to automatically widen column types. Its schema evolution modes are:&lt;/P&gt;&lt;P&gt;addNewColumns → Adds new nullable columns.&lt;BR /&gt;rescue → Captures unexpected fields in _rescued_data.&lt;BR /&gt;failOnNewColumns → Stops the stream on schema drift.&lt;/P&gt;&lt;P&gt;Type widening is a Delta Lake feature, not an Auto Loader feature. You can enable it on the target table with&lt;/P&gt;&lt;LI-CODE lang="python"&gt;ALTER TABLE &amp;lt;table_name&amp;gt; SET TBLPROPERTIES ('delta.enableTypeWidening' = 'true');&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;2. Yes, Delta Live Tables (DLT) uses Auto Loader under the hood for streaming ingestion. The default behavior is the same:&lt;/SPAN&gt;&lt;/P&gt;&lt;DIV&gt;&lt;UL&gt;&lt;LI&gt;Adds new columns when detected.&lt;/LI&gt;&lt;LI&gt;Does &lt;STRONG&gt;not&lt;/STRONG&gt; change existing column types automatically. For type widening, you rely on Delta Lake’s schema evolution and type widening features in the Silver/Gold layers. DLT does not override Auto Loader’s limitations.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;3.&amp;nbsp;&lt;!--  StartFragment   --&gt;&lt;SPAN class=""&gt;Suggested approach for unstable upstream schemas 100s of columns&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;SPAN class=""&gt;When explicit schemas or hints are impractical, consider these strategies:&lt;/SPAN&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN class=""&gt;Use schema-on-read with flexible types&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN class=""&gt;For CSV/JSON, Auto Loader often infers all columns as STRING if &lt;/SPAN&gt;&lt;SPAN class=""&gt;inferColumnTypes&lt;/SPAN&gt;&lt;SPAN class=""&gt; is off. You can then cast selectively &lt;/SPAN&gt;&lt;SPAN class=""&gt;downstream.&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN class=""&gt;Alternatively&lt;/SPAN&gt;&lt;SPAN class=""&gt;, set &lt;/SPAN&gt;&lt;SPAN class=""&gt;cloudFiles&lt;/SPAN&gt;&lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN class=""&gt;inferColumnTypes&lt;/SPAN&gt;&lt;SPAN class=""&gt;=true but combine with &lt;/SPAN&gt;&lt;SPAN class=""&gt;schemaHints&lt;/SPAN&gt;&lt;SPAN class=""&gt; for critical columns&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN class=""&gt;Leverage _&lt;/SPAN&gt;&lt;SPAN class=""&gt;rescued_data&lt;/SPAN&gt;&lt;SPAN class=""&gt; for audit and &lt;/SPAN&gt;&lt;SPAN class=""&gt;repair.Keep&lt;/SPAN&gt;&lt;SPAN class=""&gt; _&lt;/SPAN&gt;&lt;SPAN class=""&gt;rescued_data&lt;/SPAN&gt;&lt;SPAN class=""&gt; in Bronze for anomaly &lt;/SPAN&gt;&lt;SPAN class=""&gt;detection.&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN class=""&gt;Apply&lt;/SPAN&gt;&lt;SPAN class=""&gt; transformations in Silver to reconcile rescued values when needed.&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN class=""&gt;Cast numeric columns to DOUBLE or DECIMAL in the Bronze layer.&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN class=""&gt;Enable column mapping on the Delta table ('&lt;/SPAN&gt;&lt;SPAN class=""&gt;delta.columnMapping.mode&lt;/SPAN&gt;&lt;SPAN class=""&gt;' = 'name') to handle column renames and drops as metadata-only changes without data rewrites, which improves performance and resilience to structural changes.&lt;/SPAN&gt;&amp;nbsp;&lt;/LI&gt;&lt;/UL&gt;&lt;/DIV&gt;</description>
      <pubDate>Wed, 07 Jan 2026 16:16:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-type-widening/m-p/143215#M52120</guid>
      <dc:creator>nayan_wylde</dc:creator>
      <dc:date>2026-01-07T16:16:58Z</dc:date>
    </item>
    <item>
      <title>Re: AutoLoader Type Widening</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-type-widening/m-p/143221#M52121</link>
      <description>&lt;P&gt;This provides alot of clarity. Thank you.&lt;/P&gt;</description>
      <pubDate>Wed, 07 Jan 2026 16:25:38 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-type-widening/m-p/143221#M52121</guid>
      <dc:creator>parth_db</dc:creator>
      <dc:date>2026-01-07T16:25:38Z</dc:date>
    </item>
    <item>
      <title>Re: AutoLoader Type Widening</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-type-widening/m-p/143229#M52124</link>
      <description>&lt;P&gt;Thank you&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/79394"&gt;@nayan_wylde&lt;/a&gt;&amp;nbsp;for the details. This is really useful.&lt;/P&gt;</description>
      <pubDate>Wed, 07 Jan 2026 16:45:14 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-type-widening/m-p/143229#M52124</guid>
      <dc:creator>Sanjeeb2024</dc:creator>
      <dc:date>2026-01-07T16:45:14Z</dc:date>
    </item>
  </channel>
</rss>

