Databricks Community

Diogo_W · ‎10-26-2023

I have an issue where Spark in not submiting any task, on any worksapce or cluster, even SQLWarehouse.

Even for very simple code it hangs forever.

Anyone ever faced something similar? Our infra is AWS.

Diogo_W · ‎10-27-2023

Found the solution:

Turned out to be an issue with the Security Groups. The internal security group communication was not open to all ports for TCP and UDP. After fixing that the jobs ran fine. Seems like we did require more workers too.

View solution in original post

Diogo_W · ‎10-27-2023

Hi Kaniz, thanks for the reply.

I went thought the log and I see this:

[0;31mKeyboardInterrupt[0m:
23/10/26 21:06:04 INFO ProgressReporter$: Removed result fetcher for 7389618138579564799_6933402728921115182_ee7173b16c654fea9ca6968ef33e5530
23/10/26 21:06:04 INFO PythonDriverWrapper: Stopping streams for commandId pattern: CommandIdPattern(7389618138579564799,None,Some(ee7173b16c654fea9ca6968ef33e5530)).
23/10/26 21:06:06 INFO ClusterLoadAvgHelper: Current cluster load: 0, Old Ema: 1.0, New Ema: 0.85
23/10/26 21:06:08 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
23/10/26 21:06:09 INFO ClusterLoadAvgHelper: Current cluster load: 0, Old Ema: 0.85, New Ema: 0.0
23/10/26 21:06:23 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

Looks like the cluster is not getting enough resources like you mentioned. Any idea how to fix it?

Diogo_W · ‎10-27-2023

Found the solution:

Turned out to be an issue with the Security Groups. The internal security group communication was not open to all ports for TCP and UDP. After fixing that the jobs ran fine. Seems like we did require more workers too.