cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Spark doesn't register executors when new workers are allocated

ivanychev
Contributor II

Our pipelines sometimes get stuck (example).

Some workers get decommissioned due to spot termination and then the new workers get added.

Screenshot 2023-12-11 at 11.12.05.png

โ€ƒHowever, after (1) Spark doesn't notice new executors:

Screenshot 2023-12-11 at 11.08.56.png

โ€ƒAnd I don't know why. I don't understand how to debug this, but here're some of my observations:

* The init script logs of the workers, which Spark doesn't notice, are fine, they complete successfully.

* The driver logs don't show anything significant after old executors get decomissioned. Driver simply doesn't notice new executors

Screenshot 2023-12-11 at 11.48.50.png

How do Iโ€ƒ debug this and what can be the issue?

 

Sergey
1 REPLY 1

shan_chandra
Databricks Employee
Databricks Employee

@ivanychev  - Firstly, New workers are added and spark notice them hence, there is an init script logging in the event log stating the init script ran on the newly added workers.  For debugging, please check the Spark UI - executor tab. 

Secondly, For Spot Instance termination, This is mostly by the cloud provider and spot instance price fluctuation. you can ideally use hybrid clusters (with spot fall back on demand) flag set on the cluster configuration page. 

Reference: https://docs.databricks.com/en/compute/cluster-config-best-practices.html#on-demand-and-spot-instanc...

Thanks, Shan

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now