cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Community Platform Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Long running jobs get lost

jenshumrich
Contributor

Hello,
I tried to schedule a long running job and surprisingly it does seem to neither terminate (and thus does not let the cluster shut down), nor continue running, even though the state is still "Running":

jenshumrich_0-1712742957610.png
But the truth is that the job has miserably failed:

jenshumrich_2-1712743008070.png

jenshumrich_3-1712743098546.png

Sadly thus the automatization is not working. Any hint would be appreciated

 

2 REPLIES 2

shan_chandra
Esteemed Contributor

@jenshumrich -  There is not much information to decipher. However, can you please check if you have enough parallelism built for the task to execute. (spark.sql.shuffle.partitions and the no.of cores on the cluster) to begin with

Lakshay
Esteemed Contributor

Have you looked at the sql plan to see what the  spark job 72 was doing?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group