<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Autolodaer schemaHints convert valid values to null in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/autolodaer-schemahints-convert-valid-values-to-null/m-p/63329#M32209</link>
    <description>&lt;P&gt;I am ingesting json files from S3 using Autoloader and would like to use schemaHints to define the datatype of one of the fields, that is, I wanted the field id to be of integer type.&lt;/P&gt;&lt;P&gt;The DLT code below infers the the id as string, with correct values.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="javascript"&gt;create streaming table raw as
select * 
from cloud_files('/path', 'json',
  map("cloudFiles.inferColumnTypes", "true")
  );&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;But I would like the id column to be integer type, hence wanted to use schemaHints like below :&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="javascript"&gt;create streaming table raw as
select * 
from cloud_files('/path', 'json',
  map("cloudFiles.inferColumnTypes", "true",
      "cloudFiles.schemaHints", "id LONG"
  )
)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The resulting table will have the correct data type but all the values of the column/field id are now null.&lt;/P&gt;&lt;P&gt;I've done some checking on the data such as &lt;EM&gt;&lt;STRONG&gt;isnotnull(int(id))&lt;/STRONG&gt; &lt;/EM&gt;to see if there is any record in not convertible to int but surprise surprise, all ids are castable to int.&lt;/P&gt;&lt;P&gt;I already have a work-around in mind. I just want to understand why would autoloader do that?&lt;/P&gt;&lt;P&gt;Is this a known issue/bug?&lt;/P&gt;&lt;P&gt;Regards.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 12 Mar 2024 06:07:31 GMT</pubDate>
    <dc:creator>_databreaks</dc:creator>
    <dc:date>2024-03-12T06:07:31Z</dc:date>
    <item>
      <title>Autolodaer schemaHints convert valid values to null</title>
      <link>https://community.databricks.com/t5/data-engineering/autolodaer-schemahints-convert-valid-values-to-null/m-p/63329#M32209</link>
      <description>&lt;P&gt;I am ingesting json files from S3 using Autoloader and would like to use schemaHints to define the datatype of one of the fields, that is, I wanted the field id to be of integer type.&lt;/P&gt;&lt;P&gt;The DLT code below infers the the id as string, with correct values.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="javascript"&gt;create streaming table raw as
select * 
from cloud_files('/path', 'json',
  map("cloudFiles.inferColumnTypes", "true")
  );&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;But I would like the id column to be integer type, hence wanted to use schemaHints like below :&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="javascript"&gt;create streaming table raw as
select * 
from cloud_files('/path', 'json',
  map("cloudFiles.inferColumnTypes", "true",
      "cloudFiles.schemaHints", "id LONG"
  )
)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The resulting table will have the correct data type but all the values of the column/field id are now null.&lt;/P&gt;&lt;P&gt;I've done some checking on the data such as &lt;EM&gt;&lt;STRONG&gt;isnotnull(int(id))&lt;/STRONG&gt; &lt;/EM&gt;to see if there is any record in not convertible to int but surprise surprise, all ids are castable to int.&lt;/P&gt;&lt;P&gt;I already have a work-around in mind. I just want to understand why would autoloader do that?&lt;/P&gt;&lt;P&gt;Is this a known issue/bug?&lt;/P&gt;&lt;P&gt;Regards.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 12 Mar 2024 06:07:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autolodaer-schemahints-convert-valid-values-to-null/m-p/63329#M32209</guid>
      <dc:creator>_databreaks</dc:creator>
      <dc:date>2024-03-12T06:07:31Z</dc:date>
    </item>
  </channel>
</rss>

