<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Delta Live Table with Autoloader issue in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/delta-live-table-with-autoloader-issue/m-p/10121#M5366</link>
    <description>&lt;P&gt;Using autoloader, I'm reading daily data partitioned by well. The data has a specific schema, but if there's no value for a column it isn't present in the json. For a specific column on a specific table I'm getting an error like:&lt;/P&gt;&lt;P&gt;​&lt;/P&gt;&lt;P&gt;Cannot convert long type to double type on merge.&lt;/P&gt;&lt;P&gt;​&lt;/P&gt;&lt;P&gt; If I've specified the schema on load in the dlt function, why would it be throwing this? If I read the entire partition using df.read.json(path) it works fine, if I read it using df.read.format(cloudfiles).load(path) it fails due to the merge issue.&lt;/P&gt;&lt;P&gt;​&lt;/P&gt;&lt;P&gt;The column has some whole integers like 0 and 1 and decimals like 1.23456. I'm thinking what's happening is I have some wells returning a file for a partition with entirely integer numbers. Still stumped on why it might be inferring schema over taking specified schema. Even if it was inferring schema, it's supposed to read the first 1000 files or 50gb of data, and there would never be that many with only long type.&lt;/P&gt;</description>
    <pubDate>Fri, 03 Feb 2023 04:43:44 GMT</pubDate>
    <dc:creator>Jfoxyyc</dc:creator>
    <dc:date>2023-02-03T04:43:44Z</dc:date>
    <item>
      <title>Delta Live Table with Autoloader issue</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-live-table-with-autoloader-issue/m-p/10121#M5366</link>
      <description>&lt;P&gt;Using autoloader, I'm reading daily data partitioned by well. The data has a specific schema, but if there's no value for a column it isn't present in the json. For a specific column on a specific table I'm getting an error like:&lt;/P&gt;&lt;P&gt;​&lt;/P&gt;&lt;P&gt;Cannot convert long type to double type on merge.&lt;/P&gt;&lt;P&gt;​&lt;/P&gt;&lt;P&gt; If I've specified the schema on load in the dlt function, why would it be throwing this? If I read the entire partition using df.read.json(path) it works fine, if I read it using df.read.format(cloudfiles).load(path) it fails due to the merge issue.&lt;/P&gt;&lt;P&gt;​&lt;/P&gt;&lt;P&gt;The column has some whole integers like 0 and 1 and decimals like 1.23456. I'm thinking what's happening is I have some wells returning a file for a partition with entirely integer numbers. Still stumped on why it might be inferring schema over taking specified schema. Even if it was inferring schema, it's supposed to read the first 1000 files or 50gb of data, and there would never be that many with only long type.&lt;/P&gt;</description>
      <pubDate>Fri, 03 Feb 2023 04:43:44 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-live-table-with-autoloader-issue/m-p/10121#M5366</guid>
      <dc:creator>Jfoxyyc</dc:creator>
      <dc:date>2023-02-03T04:43:44Z</dc:date>
    </item>
    <item>
      <title>Re: Delta Live Table with Autoloader issue</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-live-table-with-autoloader-issue/m-p/10122#M5367</link>
      <description>&lt;P&gt;Hello!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;You can override the inferred schema by providing schema hints.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;.option("cloudFiles.schemaHints", "name string, age int")&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;For your situation , I guess the following should work&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;.option("cloudFiles.schemaHints", "&amp;lt;column name&amp;gt; long")&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 07 Feb 2023 16:20:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-live-table-with-autoloader-issue/m-p/10122#M5367</guid>
      <dc:creator>Murthy1</dc:creator>
      <dc:date>2023-02-07T16:20:47Z</dc:date>
    </item>
    <item>
      <title>Re: Delta Live Table with Autoloader issue</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-live-table-with-autoloader-issue/m-p/10123#M5368</link>
      <description>&lt;P&gt;The column is a double, and there's some longs in it, so I'm hoping schemaHints column_name double works. I'll test it out on a sample dataset where I think it should fail.&lt;/P&gt;</description>
      <pubDate>Fri, 10 Feb 2023 20:04:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-live-table-with-autoloader-issue/m-p/10123#M5368</guid>
      <dc:creator>Jfoxyyc</dc:creator>
      <dc:date>2023-02-10T20:04:23Z</dc:date>
    </item>
    <item>
      <title>Re: Delta Live Table with Autoloader issue</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-live-table-with-autoloader-issue/m-p/10124#M5369</link>
      <description>&lt;P&gt;Hi @Jordan Fox​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Hope everything is going great.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we can help you.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Cheers!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 08 Apr 2023 07:35:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-live-table-with-autoloader-issue/m-p/10124#M5369</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2023-04-08T07:35:02Z</dc:date>
    </item>
  </channel>
</rss>

