cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to enforce schema check and benefit from badRecordsPath when using autoloader

Swann
New Contributor

We would like to have a robust reader that ensure that the data we read and write using the autoloader respect the schema which is provided to the autoloader reader.

We also provide the option "badRecordsPath" (refer to https://docs.databricks.com/spark/latest/spark-sql/handling-bad-records.html) which works fine with corrupted files etc.

We have an issue similar to the one documented in https://kb.databricks.com/data/wrong-schema-in-files.html where the DECIMAL(20, 0) found in the source files in incompatible with the LONG we specify in our schema.

The main question is then: Do you have a way to make spark log to location given in badRecordsPath when the above happens rather than raising an exception (from which we cannot know the file paths causing the issue). As all this is declarative is highly depends on the available options and the implementation of ""badRecordsPath".

0 REPLIES 0

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group