cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Spark image failed to download or does not exist

SDG_Peter
New Contributor III

Good morning, and thank you for the support

In our scheduled job one cluster failed to start with the following error:

```

Run result unavailable: job failed with error message

Unexpected failure while waiting for the cluster to be ready.Cause Unexpected state for cluster: INVALID_ARGUMENT(CLIENT_ERROR): databricks_error_message: Container setup failed because of an invalid request: Spark image "release__10.4.x-snapshot-scala2.12__databricks-universe__head__dab7230__ee00e81__jenkins__9b44ccb__format-2" failed to download or does not exist.

```

This job and related configuration have worked for the past month, as well as for the runs in the days after we received this error.

In the job several cluster are spun up, and on the day of the failure only one had the error.

Where could we find more information or access some log files?

Is there a way to automatically retry to instantiate the cluster? Is setting the `Task retry policy` sufficient to patch this error, or would a retry simply find the cluster in an error state?

I have looked through https://learn.microsoft.com/en-us/azure/databricks/kb/clusters/termination-reasons but could not find a related issue.

Cheers

7 REPLIES 7

Harun
Honored Contributor

@Pietro Maria Nobili​ 

You can use Task retry policy, it will start the job cluster again. Since the scope of the job cluster ends when the task completes or failed.

AndriusVitkausk
New Contributor III

We're experiencing the same issue on our production environment, pretty much the same error just one with 9.1 and one with 11.3 runtime versions, both LTS. The pipelines do recover on subsequent runs so looks like this is an intermittent issue, might be an issue on databricks side. Would be good to get an answer.

viktor_fulop
New Contributor II

Hi!

We encounter exactly the same issue, using 10.4 LTS image on Azure EU.

Our workflows are starting between 5-9 AM CET, and every day and multiple of them fails each day.

A simple retry solves the issue, it is very frustrating, please give us an update about this.

YoshiCoppens
New Contributor II

Similarly here, on Azure, on 10.4

Kajetan
New Contributor II

Same issue on my end, seems to be a wider problem?

Azure, 10.4 LTS ML runtime... @Jose Gonzalez​ , @Landan George​ or anyone, please follow up on this topic. In my case, issue appears on PROD env.

Has anyone solved this issue?

killjoy
New Contributor III

Same issue here, workflow has been working without any problems and this morning we got the error:

Run result unavailable: job failed with error message

Unexpected failure while waiting for the cluster (--------------) to be ready.Cause Unexpected state for cluster (----------------: INVALID_ARGUMENT(CLIENT_ERROR): databricks_error_message:Container setup failed because of an invalid request: Spark image "release__9.1.x-snapshot-scala2.12__databricks-universe__head__9e5c85c__96835dd__jenkins__140ce8f__format-2" failed to download or does not exist.

LandanG
Honored Contributor
Honored Contributor

@Rita Fernandes​ @Kajetan Gęgotek​ @Yoshi Coppens​ @Viktor Fulop​ @Andrius Vitkauskas​ @Pietro Maria Nobili​ 

It looks like an issue with Azure account limits, the Databricks eng team is looking into it. Apart from retries, I'd suggest running jobs not on the hour, so instead of running a job at 1:00 AM, run it on 1:17 AM (example) which should help.

If I hear more I'll respond to this thread.

Thanks @Kajetan Gęgotek​ for tagging me and bringing this to my attention

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.