cancel
Showing results for 
Search instead for 
Did you mean: 
Community Discussions
cancel
Showing results for 
Search instead for 
Did you mean: 

Long running jobs get lost

jenshumrich
New Contributor III

Hello,
I tried to schedule a long running job and surprisingly it does seem to neither terminate (and thus does not let the cluster shut down), nor continue running, even though the state is still "Running":

jenshumrich_0-1712742957610.png
But the truth is that the job has miserably failed:

jenshumrich_2-1712743008070.png

jenshumrich_3-1712743098546.png

Sadly thus the automatization is not working. Any hint would be appreciated

 

2 REPLIES 2

shan_chandra
Esteemed Contributor
Esteemed Contributor

@jenshumrich -  There is not much information to decipher. However, can you please check if you have enough parallelism built for the task to execute. (spark.sql.shuffle.partitions and the no.of cores on the cluster) to begin with

Lakshay
Esteemed Contributor
Esteemed Contributor

Have you looked at the sql plan to see what the  spark job 72 was doing?