How to enforce schema check and benefit from badRecordsPath when using autoloader

Swann — Mon, 25 Jul 2022 08:54:11 GMT

We would like to have a robust reader that ensure that the data we read and write using the autoloader respect the schema which is provided to the autoloader reader.

We also provide the option "badRecordsPath" (refer to https://docs.databricks.com/spark/latest/spark-sql/handling-bad-records.html) which works fine with corrupted files etc.

We have an issue similar to the one documented in https://kb.databricks.com/data/wrong-schema-in-files.html where the DECIMAL(20, 0) found in the source files in incompatible with the LONG we specify in our schema.

The main question is then: Do you have a way to make spark log to location given in badRecordsPath when the above happens rather than raising an exception (from which we cannot know the file paths causing the issue). As all this is declarative is highly depends on the available options and the implementation of ""badRecordsPath".

topic How to enforce schema check and benefit from badRecordsPath when using autoloader in Data Engineering

How to enforce schema check and benefit from badRecordsPath when using autoloader