Re: Databricks Autoloader Schema Evolution throws ...

robertkoss · ‎11-23-2023

Thank you for the answer.

Clearing checkpoint data is, unfortunately, not an option. The Stream would reprocess all the data again, and this is not what I want since the Stream is running incrementally.
Manual schema declaration is also not an option since I want to add new columns.

What confuses me is that the StateSchemaNotCompatible exception is emitted from Spark Structured Streaming and is not an AutoLoader exception.

When I add a new column to the base table, the Stream fails with the NEW_FIELDS_IN_RECORD_WITH_FILE_PATH exception, which is expected when specifying addNewColumns.

When I restart the Stream, it fails with StateSchemaNotCompatible, which shouldn't be the case since the schema should be updated as soon as AutoLoader fails with the NEW_FIELDS_IN_RECORD_WITH_FILE_PATH exception.

My use case seems to be straightforward. I can not imagine that I am the only one that tries to run AutoLoader with:

Structured Streaming
JSON files as source
Column Type Inference
Automated Schema Evolution
Delta as the target