โ05-11-2023 07:23 PM
I would like to confirm and discuss HA mechanism about driver node of job compute. Because we can image driver node just like master node of cluster. In AWS EMR, we can setup 2 master node so that one of master node failed, another master node can replace quickly.
But I have reviewed official document, it seems that just have one driver node in databricks compute. This means that whether databricks driver node does not have HA mechanism?
If you have any ideas, please share or discuss it. I will be appreciate it.
โ05-12-2023 06:42 AM
@Mars Suโ As @Werner Stinckensโ mentioned there is less chance of Driver dying. one more thing is if you are not adding much load with un-necessary notebooks, if any un-used notebooks are there part of your job better to detach them .
โ05-12-2023 12:50 AM
afaik that is correct, if the driver dies your job will fail.
Also check this topic.
There are ways to run spark in HA, but I don't think it is possible on Databricks at the moment:
https://gist.github.com/aseigneurin/3af6b228490a8deab519c6aea2c209bc
If you absolutely need HA for the master/driver, I'd reach out to Databricks support.
FWIW: I do not encounter any issues with masters/drivers dying, unless I write bad code (hammering the driver with a lot of data). The spark driver itself is pretty relaxed normally. The workers/executors are stuffed with work.
โ05-12-2023 06:42 AM
@Mars Suโ As @Werner Stinckensโ mentioned there is less chance of Driver dying. one more thing is if you are not adding much load with un-necessary notebooks, if any un-used notebooks are there part of your job better to detach them .
โ05-22-2023 12:23 AM
Hi @Mars Suโ
We haven't heard from you since the last response from @Werner Stinckensโ and @karthik pโ โ, and I was checking back to see if her suggestions helped you.
Or else, If you have any solution, please share it with the community, as it can be helpful to others.
Also, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group