Autoloader [FAILED_READ_FILE.PARQUET_COLUMN_DATA_T...

Maxrb · 18 hours ago

Hi,

I am using autoloader to load parquet files into my unity catalog with the following settings:

.option("cloudFiles.format", "parquet") .option("cloudFiles.inferColumnTypes", "true") .option("cloudFiles.schemaEvolutionMode", "addNewColumnsWithTypeWidening") .option("cloudFiles.rescuedDataColumn", "_rescued_data")

In one of the newest file I have a file where a column which is a timestamp is now a Long type. I was under the impression that this faulty records would just propagate to `_rescued_data` column. but unfortunately it breaks and I can only fix my pipeline with the badRecordsPath option.

Why is it that this breaks my pipeline with: Expected Spark type timestamp, actual Parquet type INT64. SQLSTATE: KD001, instead of moving the bad data to _rescued_data.

Thanks in advance!

Autoloader [FAILED_READ_FILE.PARQUET_COLUMN_DATA_TYPE_MISMATCH]