cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Getting "Job aborted due to stage failure" SparkException when trying to download full result

Tahseen0354
Valued Contributor

I have generated a result using SQL. But whenever I try to download the full result (1 million rows), it is throwing SparkException. I can download the preview result but not the full result. Why ? What happens under the hood when I try to download the full result ?

Here is the exception:

SparkException: Job aborted due to stage failure: Task 0 in stage 133.0 failed 4 times, most recent failure: Lost task 0.3 in stage 133.0 (TID 2644) (192.***.x.x executor 6): com.databricks.sql.io.FileReadException: Error while reading file abfss:REDACTED_LOCAL_PART@someurl. It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved. If Delta cache is stale or the underlying files have been removed, you can invalidate Delta cache manually by restarting the cluster.

Caused by: FileReadException: Error while reading file abfss:REDACTED_LOCAL_PART@someurl. It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved. If Delta cache is stale or the underlying files have been removed, you can invalidate Delta cache manually by restarting the cluster.

Caused by: FileNotFoundException: Operation failed: "The specified path does not exist.", 404, HEAD, https://***.snappy.parquet?upn=false&action=getStatus&timeout=90

Caused by: AbfsRestOperationException: Operation failed: "The specified path does not exist.", 404, HEAD, https://***.snappy.parquet?upn=false&action=getStatus&timeout=90

1 ACCEPTED SOLUTION

Accepted Solutions

Tahseen0354
Valued Contributor

It's working now, I think it was a network issue.

View solution in original post

8 REPLIES 8

Anonymous
Not applicable

@Md Tahseen Anamโ€‹ - Hello! My name is Piper and I'm one of the community moderators. Thanks for your question. Let's give it a bit longer to see what the community has to say. Hang in there!

Hi, thank you for your reply. Would be great to get some lights in here.

User16763506477
Contributor III

Hi @Md Tahseen Anamโ€‹ are there any updates happening to the table while you are downloading the results?

No update. can it be a network issue ?

hi @Md Tahseen Anamโ€‹ ,

Have you try the following steps to re-run your query and get the full results? docs here

Tahseen0354
Valued Contributor

It's working now, I think it was a network issue.

Anonymous
Not applicable

@Md Tahseen Anamโ€‹ - Thanks for letting us know. I'm glad things are working!

rpshgupta
New Contributor III

I am also having this issue again and again. I really want to understand what can we do to avoid this?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group