Databricks Community

DylanStout · ‎03-20-2025

The code we are executing:

df = spark.read.format("parquet").load("/mnt/g/drb/HN/")
df.write.mode('overwrite').saveAsTable("bronze.HN")

the error it throws:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 44 in stage 642.0 failed 4 times, most recent failure: Lost task 44.3 in stage 642.0 (TID 8175) (10.1.162.134 executor 1): com.databricks.sql.io.FileReadException: Error while reading file dbfs:/mnt/g/drb/HN/HN_1.

In the /mnt/g/drb/HN/ there are multiple parquet files, when loading and displaying all of these files in a single Spark Dataframe it displays it correctly. However, when we try to save it as a table the same error is thrown.

How we tried to save the table from the Spark Dataframe: Created a temp view -> save as table.

We tried increasing the compute size, currently it is at 24 DBU, which did not resolve the issue.

For other parquet files in a different cloud storage container we are able to correctly create tables (in the hive_metastore)

So how are we able to store these parquet files in a table?

DylanStout · ‎03-27-2025

spark.conf.set("spark.sql.parquet.enableVectorizedReader", "false")

View solution in original post

ashraf1395 · ‎03-20-2025

Try these solutions

https://community.databricks.com/t5/data-engineering/how-can-i-convert-a-parquet-into-delta-table/td...

DylanStout · ‎03-27-2025

spark.conf.set("spark.sql.parquet.enableVectorizedReader", "false")

Databricks Community

Error while reading file from Cloud Storage

Join Us as a Local Community Builder!

Join us for another BrickTalk: Vibe-Coding Databricks Apps in Replit with Augusto!

🌟 Community Pulse: Your Weekly Roundup! November 14 – 20, 2025

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples