cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Differences between Spark Cluster Manager and Databricks Cluster Manager?

jwilliam
Contributor

I didn't found any documentation on Databricks Cluster Manager. Could anyone give me some resources on this topic?

1 ACCEPTED SOLUTION

Accepted Solutions

User16752242622
Valued Contributor

Hi @John Williamโ€‹ 

Databricks clusters use Spark's Standalone cluster manager. Each Databricks cluster has its own standalone Master and Worker processes run inside of the LXC containers and share a lifecycle with the cluster. Each cluster has a single Driver process, which acts as the sole Spark application for the standalone cluster.

Here is the official Spark Standalone cluster mode doc: https://spark.apache.org/docs/latest/spark-standalone.html

View solution in original post

3 REPLIES 3

User16752242622
Valued Contributor

Hi @John Williamโ€‹ 

Databricks clusters use Spark's Standalone cluster manager. Each Databricks cluster has its own standalone Master and Worker processes run inside of the LXC containers and share a lifecycle with the cluster. Each cluster has a single Driver process, which acts as the sole Spark application for the standalone cluster.

Here is the official Spark Standalone cluster mode doc: https://spark.apache.org/docs/latest/spark-standalone.html

Hi @Akash Bhatโ€‹ , thank you for your reply. I really surprise that Databricks clusters use Spark's Standalone cluster manager because if I read correctly here, Databricks uses Kubernnetes as cluster manager https://www.databricks.com/blog/2021/08/06/how-we-built-databricks-on-google-kubernetes-engine-gke.h...

Hi @John Williamโ€‹ 

The cluster manager launches worker instances and starts worker services

The cluster manager issues API calls to a cloud provider (AWS or Azure) in order to obtain these instances for a cluster.

Whereas Databricks on GCP maintains a Google's Kubernetes Engine (GKE) node pools for provisioning the driver node and the executor nodes

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group