Structured Streaming Auto Loader UnknownFieldsException and Workflow Retries

ilarsen
Contributor

Hi.

 

I am using structured streaming and auto loader to read json files, and it is automated by Workflow.  I am having difficulties with the job failing as schema changes are detected, but not retrying.  Hopefully someone can point me in the right direction?

 

From what I understand from the documentation (Configure schema inference and evolution in Auto Loader - Azure Databricks | Microsoft Learn) is that the failure is expected behaviour.  I have enabled retries on the Job Task, but it does not appear that the retries are happening.  What could be a key point, is that the job task runs a notebook, which in turn runs another notebook containing the auto loader code by 

dbutils.notebook.run()

Could the notebook.run() approach be conflicting with the retry setting on the Task, and therefore no retry?

 

My stream reader has these options defined:

options = {
        "cloudFiles.useNotifications": "false",
        "cloudFiles.schemaLocation": <variable to checkpoint location here>,
        "cloudFiles.format": "json",
        "cloudFiles.includeExistingFiles": True,
        "cloudFiles.inferColumnTypes": True,
        "cloudFiles.useIncrementalListing": "true"
    }

I notice that my options do not include 

 

cloudFiles.schemaEvolutionMode

 

, but according to the docs that is default.

 

My stream writer has this option defined:

.option("mergeSchema", "true")

 

Does anyone have any ideas please?