How to enforce schema check and benefit from badRecordsPath when using autoloader

Data Engineering

Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.

We would like to have a robust reader that ensure that the data we read and write using the autoloader respect the schema which is provided to the autoloader reader.

We also provide the option "badRecordsPath" (refer to https://docs.databricks.com/spark/latest/spark-sql/handling-bad-records.html) which works fine with corrupted files etc.

We have an issue similar to the one documented in https://kb.databricks.com/data/wrong-schema-in-files.html where the DECIMAL(20, 0) found in the source files in incompatible with the LONG we specify in our schema.

The main question is then: Do you have a way to make spark log to location given in badRecordsPath when the above happens rather than raising an exception (from which we cannot know the file paths causing the issue). As all this is declarative is highly depends on the available options and the implementation of ""badRecordsPath".