Databricks

JEAG · ‎10-19-2021

Hi, we are having this chain of errors every day in different files and processes:

An error occurred while calling o11255.parquet.

: org.apache.spark.SparkException: Job aborted.

Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 982.0 failed 4 times, most recent failure: Lost task 0.3 in stage 982.0 (TID 85705, 172.20.45.5, executor 31): org.apache.spark.SparkException: Task failed while writing rows.

Caused by: com.databricks.sql.io.FileReadException: Error while reading file dbfs: ... It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved.

Caused by: shaded.parquet.org.apache.thrift.transport.TTransportException: java.io.IOException: Stream is closed!

Caused by: java.io.IOException: Stream is closed!

Caused by: java.io.FileNotFoundException: dbfs:/...

Now, we fix it deleting the file and running again the job, but we don´t know how to avoid the error

Any idea?

Thxs

Kaniz · ‎02-24-2022

Hi @Jose Eliseo Aznarte Garcia ,

This is expected behaviour when you update some rows in the table and immediately query the table.

From the error message:

It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running the 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved.

To resolve this issue, refresh all cached entries that are associated with the table.