cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Error while reading file from Cloud Storage

DylanStout
Contributor

The code we are executing: 

df = spark.read.format("parquet").load("/mnt/g/drb/HN/") 
df.write.mode('overwrite').saveAsTable("bronze.HN")

the error it throws:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 44 in stage 642.0 failed 4 times, most recent failure: Lost task 44.3 in stage 642.0 (TID 8175) (10.1.162.134 executor 1): com.databricks.sql.io.FileReadException: Error while reading file dbfs:/mnt/g/drb/HN/HN_1.

In the /mnt/g/drb/HN/ there are multiple parquet files, when loading and displaying all of these files in a single Spark Dataframe it displays it correctly. However, when we try to save it as a table the same error is thrown.

How we tried to save the table from the Spark Dataframe: Created a temp view -> save as table.

We tried increasing the compute size, currently it is at 24 DBU, which did not resolve the issue.

For other parquet files in a different cloud storage container we are able to correctly create tables (in the hive_metastore)

So how are we able to store these parquet files in a table?

 

 

1 REPLY 1

ashraf1395
Valued Contributor III

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group