<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Delta Live Table autoloader's inferColumnTypes does not work in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/delta-live-table-autoloader-s-infercolumntypes-does-not-work/m-p/39052#M26864</link>
    <description>&lt;P&gt;I am experimenting with DLTs/Autoloader. I have a simple, flat JSON file that I am attempting to load into a DLT (following &lt;A href="https://youtu.be/BIxwoO65ylY" target="_self"&gt;this guide&lt;/A&gt;) like so:&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;CREATE OR REFRESH STREAMING LIVE TABLE statistics_live
COMMENT "The raw statistics data"
TBLPROPERTIES ("quality" = "bronze")
AS SELECT * FROM cloud_files("/mnt/raw/statistics/", "json", map("cloudFiles.inferColumnTypes", "true"));&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The error message I am getting is:&amp;nbsp;&lt;STRONG&gt;&lt;SPAN&gt;com.databricks.sql.cloudfiles.errors.CloudFilesAnalysisException: Failed to infer schema for format json from existing files in input path /mnt/raw/statistics/. Please ensure you configured the options properly or explicitly specify the schema.&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The JSON file looks like this:&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="javascript"&gt;[
  {
    "pass": 26,
    "rush": 5,
    "total_return": 1,
    "total": 32,
    "fumble_return": 0,
    "int_return": 1,
    "kick_return": 0,
    "punt_return": 0,
    "other": 0
  }
]&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I've seen a lot of "answers" out there saying to just specify the schema but if I expect my schema to change over time that is not an option.&amp;nbsp;&lt;/P&gt;&lt;P&gt;EDIT:&amp;nbsp;&lt;SPAN&gt;Interestingly enough, I moved on to generating the full JSON file and storing it in our cloud storage rather than working with a partial file. The fully generated file was inferred correctly when I triggered the autoloader pipeline, complex child JSON properties and all. I guess I'll leave the question up though because I have no clue why the partial file was throwing exceptions at me.&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 03 Aug 2023 20:18:33 GMT</pubDate>
    <dc:creator>kwinsor5</dc:creator>
    <dc:date>2023-08-03T20:18:33Z</dc:date>
    <item>
      <title>Delta Live Table autoloader's inferColumnTypes does not work</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-live-table-autoloader-s-infercolumntypes-does-not-work/m-p/39052#M26864</link>
      <description>&lt;P&gt;I am experimenting with DLTs/Autoloader. I have a simple, flat JSON file that I am attempting to load into a DLT (following &lt;A href="https://youtu.be/BIxwoO65ylY" target="_self"&gt;this guide&lt;/A&gt;) like so:&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;CREATE OR REFRESH STREAMING LIVE TABLE statistics_live
COMMENT "The raw statistics data"
TBLPROPERTIES ("quality" = "bronze")
AS SELECT * FROM cloud_files("/mnt/raw/statistics/", "json", map("cloudFiles.inferColumnTypes", "true"));&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The error message I am getting is:&amp;nbsp;&lt;STRONG&gt;&lt;SPAN&gt;com.databricks.sql.cloudfiles.errors.CloudFilesAnalysisException: Failed to infer schema for format json from existing files in input path /mnt/raw/statistics/. Please ensure you configured the options properly or explicitly specify the schema.&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The JSON file looks like this:&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="javascript"&gt;[
  {
    "pass": 26,
    "rush": 5,
    "total_return": 1,
    "total": 32,
    "fumble_return": 0,
    "int_return": 1,
    "kick_return": 0,
    "punt_return": 0,
    "other": 0
  }
]&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I've seen a lot of "answers" out there saying to just specify the schema but if I expect my schema to change over time that is not an option.&amp;nbsp;&lt;/P&gt;&lt;P&gt;EDIT:&amp;nbsp;&lt;SPAN&gt;Interestingly enough, I moved on to generating the full JSON file and storing it in our cloud storage rather than working with a partial file. The fully generated file was inferred correctly when I triggered the autoloader pipeline, complex child JSON properties and all. I guess I'll leave the question up though because I have no clue why the partial file was throwing exceptions at me.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 03 Aug 2023 20:18:33 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-live-table-autoloader-s-infercolumntypes-does-not-work/m-p/39052#M26864</guid>
      <dc:creator>kwinsor5</dc:creator>
      <dc:date>2023-08-03T20:18:33Z</dc:date>
    </item>
    <item>
      <title>Re: Delta Live Table autoloader's inferColumnTypes does not work</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-live-table-autoloader-s-infercolumntypes-does-not-work/m-p/39057#M26867</link>
      <description>&lt;P&gt;Interestingly enough, I moved on to generating the full JSON file and storing it in our cloud storage rather than working with a partial file. The fully generated file was inferred correctly when I triggered the autoloader pipeline, complex child JSON properties and all. I guess I'll leave the question up though because I have no clue why the partial file was throwing exceptions at me.&lt;/P&gt;</description>
      <pubDate>Thu, 03 Aug 2023 20:18:01 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-live-table-autoloader-s-infercolumntypes-does-not-work/m-p/39057#M26867</guid>
      <dc:creator>kwinsor5</dc:creator>
      <dc:date>2023-08-03T20:18:01Z</dc:date>
    </item>
    <item>
      <title>Re: Delta Live Table autoloader's inferColumnTypes does not work</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-live-table-autoloader-s-infercolumntypes-does-not-work/m-p/80493#M36037</link>
      <description>&lt;P&gt;I had the same issue with a similar JSON structure as yours. Adding the option "multiLine" set to true fixed it for me.&lt;/P&gt;&lt;LI-CODE lang="python"&gt;df = (spark.readStream.format("cloudFiles")
  .option("multiLine", "true")
  .option("cloudFiles.schemaLocation", schemaLocation)
  .option("cloudFiles.format", "json")
  .option("cloudFiles.inferColumnTypes", "true")
  .option("cloudFiles.schemaEvolutionMode", "addNewColumns")
  .load(landingZoneLocation)
)&lt;/LI-CODE&gt;</description>
      <pubDate>Thu, 25 Jul 2024 09:20:41 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-live-table-autoloader-s-infercolumntypes-does-not-work/m-p/80493#M36037</guid>
      <dc:creator>pavlos_skev</dc:creator>
      <dc:date>2024-07-25T09:20:41Z</dc:date>
    </item>
  </channel>
</rss>

