Databricks Community

Jfoxyyc · ‎02-02-2023

Using autoloader, I'm reading daily data partitioned by well. The data has a specific schema, but if there's no value for a column it isn't present in the json. For a specific column on a specific table I'm getting an error like:

Cannot convert long type to double type on merge.

If I've specified the schema on load in the dlt function, why would it be throwing this? If I read the entire partition using df.read.json(path) it works fine, if I read it using df.read.format(cloudfiles).load(path) it fails due to the merge issue.

The column has some whole integers like 0 and 1 and decimals like 1.23456. I'm thinking what's happening is I have some wells returning a file for a partition with entirely integer numbers. Still stumped on why it might be inferring schema over taking specified schema. Even if it was inferring schema, it's supposed to read the first 1000 files or 50gb of data, and there would never be that many with only long type.

Murthy1 · ‎02-07-2023

Hello!

You can override the inferred schema by providing schema hints.

.option("cloudFiles.schemaHints", "name string, age int")

For your situation , I guess the following should work

.option("cloudFiles.schemaHints", "<column name> long")

Jfoxyyc · ‎02-10-2023

The column is a double, and there's some longs in it, so I'm hoping schemaHints column_name double works. I'll test it out on a sample dataset where I think it should fail.

Anonymous · ‎04-08-2023

Hi @Jordan Fox

Hope everything is going great.

Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we can help you.

Cheers!

Databricks Community

Delta Live Table with Autoloader issue

Join Us as a Local Community Builder!

Free Edition Hackathon

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples

Level Up with Databricks Specialist Sessions

🌟 Community Pulse: Your Weekly Roundup! November 07 – 13, 2025

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐