Databricks

Diogo_W · ‎10-26-2023

I have an issue where Spark in not submiting any task, on any worksapce or cluster, even SQLWarehouse.

Even for very simple code it hangs forever.

Anyone ever faced something similar? Our infra is AWS.

Diogo_W · ‎10-27-2023

Found the solution:

Turned out to be an issue with the Security Groups. The internal security group communication was not open to all ports for TCP and UDP. After fixing that the jobs ran fine. Seems like we did require more workers too.

View solution in original post

Kaniz · ‎10-27-2023

Hi @Diogo_W ,

Indeed, if Spark isn't submitting tasks in your workspace or cluster, consider these steps:

Resource Availability:
- Ensure you have enough resources available in your account or group.
- Check your quota and usage on the "Account" page.
Cluster Status:
- Verify that your Spark cluster is running.
- Check the status on the "Clusters" page and start it if necessary.
Network and VPN:
- Check for network issues if you're using a VPN or in a remote location.
- Ensure a stable and reliable VPN connection.
Code Validation:
- Confirm your code is correct. Run a simple Spark job or a sample PySpark script to check for errors.
Logs Inspection:
- Examine Spark logs for error messages or exceptions—access logs from the notebook by clicking the "Logs" button.

If these steps don't resolve the issue, consider filing a support ticket with Databricks support for further assistance.

Diogo_W · ‎10-27-2023

Hi Kaniz, thanks for the reply.

I went thought the log and I see this:

[0;31mKeyboardInterrupt[0m:
23/10/26 21:06:04 INFO ProgressReporter$: Removed result fetcher for 7389618138579564799_6933402728921115182_ee7173b16c654fea9ca6968ef33e5530
23/10/26 21:06:04 INFO PythonDriverWrapper: Stopping streams for commandId pattern: CommandIdPattern(7389618138579564799,None,Some(ee7173b16c654fea9ca6968ef33e5530)).
23/10/26 21:06:06 INFO ClusterLoadAvgHelper: Current cluster load: 0, Old Ema: 1.0, New Ema: 0.85
23/10/26 21:06:08 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
23/10/26 21:06:09 INFO ClusterLoadAvgHelper: Current cluster load: 0, Old Ema: 0.85, New Ema: 0.0
23/10/26 21:06:23 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

Looks like the cluster is not getting enough resources like you mentioned. Any idea how to fix it?

Diogo_W · ‎10-27-2023

Found the solution:

Turned out to be an issue with the Security Groups. The internal security group communication was not open to all ports for TCP and UDP. After fixing that the jobs ran fine. Seems like we did require more workers too.

Databricks

Spark in not executing any tasks

Unity Catalog Lakeguard: Industry-first and only data governance for multi-user Apache™ Spark cluste

Announcing the General Availability of Databricks Asset Bundles

Register now and save 50% on training at Data + AI Summit!

How to successfully build GenAI applications

Meet DBRX, the New Standard for High-Quality LLMs