Error while reading file from Cloud Storage

DylanStout — Thu, 20 Mar 2025 15:31:40 GMT

The code we are executing:

df = spark.read.format("parquet").load("/mnt/g/drb/HN/")
df.write.mode('overwrite').saveAsTable("bronze.HN")

the error it throws:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 44 in stage 642.0 failed 4 times, most recent failure: Lost task 44.3 in stage 642.0 (TID 8175) (10.1.162.134 executor 1): com.databricks.sql.io.FileReadException: Error while reading file dbfs:/mnt/g/drb/HN/HN_1.

In the /mnt/g/drb/HN/ there are multiple parquet files, when loading and displaying all of these files in a single Spark Dataframe it displays it correctly. However, when we try to save it as a table the same error is thrown.

How we tried to save the table from the Spark Dataframe: Created a temp view -> save as table.

We tried increasing the compute size, currently it is at 24 DBU, which did not resolve the issue.

For other parquet files in a different cloud storage container we are able to correctly create tables (in the hive_metastore)

So how are we able to store these parquet files in a table?

Re: Error while reading file from Cloud Storage

ashraf1395 — Thu, 20 Mar 2025 15:46:52 GMT

Try these solutions

https://community.databricks.com/t5/data-engineering/how-can-i-convert-a-parquet-into-delta-table/td-p/14348

Re: Error while reading file from Cloud Storage

DylanStout — Thu, 27 Mar 2025 13:51:40 GMT

spark.conf.set("spark.sql.parquet.enableVectorizedReader", "false")

topic Error while reading file from Cloud Storage in Data Engineering

Error while reading file from Cloud Storage

Re: Error while reading file from Cloud Storage

Re: Error while reading file from Cloud Storage