- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-30-2022 12:56 AM
I didn't found any documentation on Databricks Cluster Manager. Could anyone give me some resources on this topic?
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-30-2022 05:58 AM
Hi @John William
Databricks clusters use Spark's Standalone cluster manager. Each Databricks cluster has its own standalone Master and Worker processes run inside of the LXC containers and share a lifecycle with the cluster. Each cluster has a single Driver process, which acts as the sole Spark application for the standalone cluster.
Here is the official Spark Standalone cluster mode doc: https://spark.apache.org/docs/latest/spark-standalone.html
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-30-2022 05:58 AM
Hi @John William
Databricks clusters use Spark's Standalone cluster manager. Each Databricks cluster has its own standalone Master and Worker processes run inside of the LXC containers and share a lifecycle with the cluster. Each cluster has a single Driver process, which acts as the sole Spark application for the standalone cluster.
Here is the official Spark Standalone cluster mode doc: https://spark.apache.org/docs/latest/spark-standalone.html
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-05-2022 12:35 AM
Hi @Akash Bhat , thank you for your reply. I really surprise that Databricks clusters use Spark's Standalone cluster manager because if I read correctly here, Databricks uses Kubernnetes as cluster manager https://www.databricks.com/blog/2021/08/06/how-we-built-databricks-on-google-kubernetes-engine-gke.h...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-06-2022 11:32 AM
Hi @John William
The cluster manager launches worker instances and starts worker services
The cluster manager issues API calls to a cloud provider (AWS or Azure) in order to obtain these instances for a cluster.
Whereas Databricks on GCP maintains a Google's Kubernetes Engine (GKE) node pools for provisioning the driver node and the executor nodes

