Error while reading file from Cloud Storage
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
yesterday
The code we are executing:
df = spark.read.format("parquet").load("/mnt/g/drb/HN/")
df.write.mode('overwrite').saveAsTable("bronze.HN")
the error it throws:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 44 in stage 642.0 failed 4 times, most recent failure: Lost task 44.3 in stage 642.0 (TID 8175) (10.1.162.134 executor 1): com.databricks.sql.io.FileReadException: Error while reading file dbfs:/mnt/g/drb/HN/HN_1.
In the /mnt/g/drb/HN/ there are multiple parquet files, when loading and displaying all of these files in a single Spark Dataframe it displays it correctly. However, when we try to save it as a table the same error is thrown.
How we tried to save the table from the Spark Dataframe: Created a temp view -> save as table.
We tried increasing the compute size, currently it is at 24 DBU, which did not resolve the issue.
For other parquet files in a different cloud storage container we are able to correctly create tables (in the hive_metastore)
So how are we able to store these parquet files in a table?
- Labels:
-
Spark
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
yesterday

