cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Does driver node of job compute have HA?

MarsSu
New Contributor II

I would like to confirm and discuss HA mechanism about driver node of job compute. Because we can image driver node just like master node of cluster. In AWS EMR, we can setup 2 master node so that one of master node failed, another master node can replace quickly.

But I have reviewed official document, it seems that just have one driver node in databricks compute. This means that whether databricks driver node does not have HA mechanism?

If you have any ideas, please share or discuss it. I will be appreciate it.

1 ACCEPTED SOLUTION

Accepted Solutions

karthik_p
Esteemed Contributor

@Mars Su​ As @Werner Stinckens​ mentioned there is less chance of Driver dying. one more thing is if you are not adding much load with un-necessary notebooks, if any un-used notebooks are there part of your job better to detach them .

image.pngimage 

View solution in original post

3 REPLIES 3

-werners-
Esteemed Contributor III

afaik that is correct, if the driver dies your job will fail.

Also check this topic.

There are ways to run spark in HA, but I don't think it is possible on Databricks at the moment:

https://gist.github.com/aseigneurin/3af6b228490a8deab519c6aea2c209bc

If you absolutely need HA for the master/driver, I'd reach out to Databricks support.

FWIW: I do not encounter any issues with masters/drivers dying, unless I write bad code (hammering the driver with a lot of data). The spark driver itself is pretty relaxed normally. The workers/executors are stuffed with work.

karthik_p
Esteemed Contributor

@Mars Su​ As @Werner Stinckens​ mentioned there is less chance of Driver dying. one more thing is if you are not adding much load with un-necessary notebooks, if any un-used notebooks are there part of your job better to detach them .

image.pngimage 

Anonymous
Not applicable

Hi @Mars Su​ 

We haven't heard from you since the last response from @Werner Stinckens​ and @karthik p​ ​, and I was checking back to see if her suggestions helped you.

Or else, If you have any solution, please share it with the community, as it can be helpful to others. 

Also, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.