Using autoloader, I'm reading daily data partitioned by well. The data has a specific schema, but if there's no value for a column it isn't present in the json. For a specific column on a specific table I'm getting an error like:
Cannot convert long type to double type on merge.
If I've specified the schema on load in the dlt function, why would it be throwing this? If I read the entire partition using df.read.json(path) it works fine, if I read it using df.read.format(cloudfiles).load(path) it fails due to the merge issue.
The column has some whole integers like 0 and 1 and decimals like 1.23456. I'm thinking what's happening is I have some wells returning a file for a partition with entirely integer numbers. Still stumped on why it might be inferring schema over taking specified schema. Even if it was inferring schema, it's supposed to read the first 1000 files or 50gb of data, and there would never be that many with only long type.