I am using Interactive cluster to run frequent (every 15min) batch job.
After certain time (example: 6hours), the cluster continuously starts showing Driver is up but is not responsive, likely due to GC. in event log and all jobs starts failing.
If the cluster is restarted all jobs again starts to execute successfully.
Can someone help here with what is the root cause and how to resolve this without restarting cluster.
Thanks in Advance.
#drivernotresponsive #GC