Hi, I am getting the following error:
com.databricks.sql.io.FileReadException: Error while reading file wasbs:REDACTED_LOCAL_PART@blobStorageName.blob.core.windows.net/cook/processYear=2021/processMonth=12/processDay=30/processHour=18/part-00003-tid-4178615623264760328.c000.avro.
Caused by: com.microsoft.azure.storage.StorageException: Blob hash mismatch (integrity check failed), Expected value is 8P7bo1mnLPoLxVw==, retrieved bu+CiCkLm/kc6QA==.
where processYear, processMonth, processDay and processHour are partition columns.
however, this is actually just a WARN, and the code still proceeds to execute(also I am able to read this file separately in notebook)... but eventually the job dies due to:
WARN Lost task 9026.0 in stage 324.0 (TID 1525596, 10.139.64.16, executor 83): TaskKilled (Stage cancelled)
I am using the following databricks and spark configs:
RuntimeVersion: 5.5.x-scala2.11
MasterConfiguration:
NodeType: Standard_D32s_v3
NumberOfNodes: 1
WorkerConfiguration:
NodeType: Standard_D32s_v3
NumberOfNodes: 2
This same job is deployed in several other environments too with much more data volume, and it does not fail there. Any idea why it may fail here?
Thanks!