cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Spark in not executing any tasks

Diogo_W
New Contributor III

I have an issue where Spark in not submiting any task, on any worksapce or cluster, even SQLWarehouse.

Even for very simple code it hangs forever.

Diogo_W_0-1698352974280.png

Diogo_W_1-1698353051402.png

Anyone ever faced something similar? Our infra is AWS.

 

1 ACCEPTED SOLUTION

Accepted Solutions

Diogo_W
New Contributor III

Found the solution:

 

Turned out to be an issue with the Security Groups. The internal security group communication was not open to all ports for TCP and UDP. After fixing that the jobs ran fine. Seems like we did require more workers too.

View solution in original post

3 REPLIES 3

Kaniz
Community Manager
Community Manager

Hi @Diogo_W ,

Indeed, if Spark isn't submitting tasks in your workspace or cluster, consider these steps:

  1. Resource Availability:

    • Ensure you have enough resources available in your account or group.
    • Check your quota and usage on the "Account" page.
  2. Cluster Status:

    • Verify that your Spark cluster is running.
    • Check the status on the "Clusters" page and start it if necessary.
  3. Network and VPN:

    • Check for network issues if you're using a VPN or in a remote location.
    • Ensure a stable and reliable VPN connection.
  4. Code Validation:

    • Confirm your code is correct. Run a simple Spark job or a sample PySpark script to check for errors.
  5. Logs Inspection:

    • Examine Spark logs for error messages or exceptions—access logs from the notebook by clicking the "Logs" button.

If these steps don't resolve the issue, consider filing a support ticket with Databricks support for further assistance.

Diogo_W
New Contributor III

Hi Kaniz, thanks for the reply.

I went thought the log and I see this:

KeyboardInterrupt:
23/10/26 21:06:04 INFO ProgressReporter$: Removed result fetcher for 7389618138579564799_6933402728921115182_ee7173b16c654fea9ca6968ef33e5530
23/10/26 21:06:04 INFO PythonDriverWrapper: Stopping streams for commandId pattern: CommandIdPattern(7389618138579564799,None,Some(ee7173b16c654fea9ca6968ef33e5530)).
23/10/26 21:06:06 INFO ClusterLoadAvgHelper: Current cluster load: 0, Old Ema: 1.0, New Ema: 0.85
23/10/26 21:06:08 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
23/10/26 21:06:09 INFO ClusterLoadAvgHelper: Current cluster load: 0, Old Ema: 0.85, New Ema: 0.0
23/10/26 21:06:23 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

Looks like the cluster is not getting enough resources like you mentioned. Any idea how to fix it?

Diogo_W
New Contributor III

Found the solution:

 

Turned out to be an issue with the Security Groups. The internal security group communication was not open to all ports for TCP and UDP. After fixing that the jobs ran fine. Seems like we did require more workers too.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.