Error reading in Parquet file
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-14-2023 04:09 AM
I am trying to read a .parqest file from a ADLS gen2 location in azure databricks . But facing the below error:
spark.read.parquet("abfss://............/..._2023-01-14T08:01:29.8549884Z.parquet")
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3) (10.139.64.6 executor 0): org.apache.spark.SparkException: Exception thrown in awaitResult:
I searched in google ( as per suggestion in some posts tried to set spark.driver.maxResultSize to 20g , some blogs says to put inferSchema option ) but getting the same error again and again . The file size I am trying to read is 12kb .
I tried with below runtime versions in my databricks cluster
11.3 LTS (includes Apache Spark 3.3.0, Scala 2.12)
11.1 (includes Apache Spark 3.3.0, Scala 2.12)
10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12)
Can anyone please advise how to overcome this issue ?
- Labels:
-
Azure databricks
-
Error
-
LTS
-
Parquet File
-
TID
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-14-2023 10:59 AM
This error may be related to credential issue.
You can try this code
spark.conf.set("fs.azure.account.auth.type.<storage-account-name>.dfs.core.windows.net", "<your-access-key>")
spark.read.parquet("abfss://............/..._2023-01-14T08:01:29.8549884Z.parquet")
To hide <your-access-key> you can create Secret scopes follow the instructions link below:
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-14-2023 01:44 PM
Thanks for your answer .
But I was using same kind of code with access key
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-14-2023 01:54 PM
I tried again , but the same error
spark.conf.set("fs.azure.account.key.<ContainerName>.dfs.core.windows.net",ACCESS_KEY)
spark.read.parquet("abfss://............/..._2023-01-14T08:01:29.8549884Z.parquet")
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-30-2023 02:51 PM
Can you access the executor logs? When you cluster is up and running, you can access the executor's logs. For example, the error shows:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3) (10.139.64.6 executor 0): org.apache.spark.SparkException: Exception thrown in awaitResult:
Go to the Executor 0 and check why it failed