cancel
Showing results for 
Search instead for 
Did you mean: 
Community Discussions
cancel
Showing results for 
Search instead for 
Did you mean: 

Spark doesn't register executors when new workers are allocated

ivanychev
Contributor

Our pipelines sometimes get stuck (example).

Some workers get decommissioned due to spot termination and then the new workers get added.

Screenshot 2023-12-11 at 11.12.05.png

 However, after (1) Spark doesn't notice new executors:

Screenshot 2023-12-11 at 11.08.56.png

 And I don't know why. I don't understand how to debug this, but here're some of my observations:

* The init script logs of the workers, which Spark doesn't notice, are fine, they complete successfully.

* The driver logs don't show anything significant after old executors get decomissioned. Driver simply doesn't notice new executors

Screenshot 2023-12-11 at 11.48.50.png

How do I  debug this and what can be the issue?

 

1 REPLY 1

shan_chandra
Honored Contributor III
Honored Contributor III

@ivanychev  - Firstly, New workers are added and spark notice them hence, there is an init script logging in the event log stating the init script ran on the newly added workers.  For debugging, please check the Spark UI - executor tab. 

Secondly, For Spot Instance termination, This is mostly by the cloud provider and spot instance price fluctuation. you can ideally use hybrid clusters (with spot fall back on demand) flag set on the cluster configuration page. 

Reference: https://docs.databricks.com/en/compute/cluster-config-best-practices.html#on-demand-and-spot-instanc...

Thanks, Shan

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.