Differences between Spark Cluster Manager and Databricks Cluster Manager?

jwilliam
Contributor

I didn't found any documentation on Databricks Cluster Manager. Could anyone give me some resources on this topic?

User16752242622
Databricks Employee
Databricks Employee

Hi @John William​ 

Databricks clusters use Spark's Standalone cluster manager. Each Databricks cluster has its own standalone Master and Worker processes run inside of the LXC containers and share a lifecycle with the cluster. Each cluster has a single Driver process, which acts as the sole Spark application for the standalone cluster.

Here is the official Spark Standalone cluster mode doc: https://spark.apache.org/docs/latest/spark-standalone.html

View solution in original post

Hi @Akash Bhat​ , thank you for your reply. I really surprise that Databricks clusters use Spark's Standalone cluster manager because if I read correctly here, Databricks uses Kubernnetes as cluster manager https://www.databricks.com/blog/2021/08/06/how-we-built-databricks-on-google-kubernetes-engine-gke.h...

Hi @John William​ 

The cluster manager launches worker instances and starts worker services

The cluster manager issues API calls to a cloud provider (AWS or Azure) in order to obtain these instances for a cluster.

Whereas Databricks on GCP maintains a Google's Kubernetes Engine (GKE) node pools for provisioning the driver node and the executor nodes