cancel
Showing results for 
Search instead for 
Did you mean: 
Community Platform Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results for 
Search instead for 
Did you mean: 

DataBricks Cluster

DBEnthusiast
New Contributor III

Hi All,

I am curious to know the difference between a spark cluster and a DataBricks one.

As per the info I have read Spark Cluster creates driver and Workers when the Application is submitted whereas in Databricks we can create cluster in advance in case of interactive cluster and a cluster is created on the fly for Job cluster

I need to understand what resides inside a worker. As per documentation workers have docker image which has all necessary stuff needed to run a worker but I still have some questions

1. How much is the memory available after docker image is installed . It would definitely be less than the memory available initially as DS3V2 will not have 14GB or close to that

2. What is the Resource Manager in Data bricks ? Seems like its Standalone Resource Manager . Can we change that to YARN or MESOS ?

 

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz_Fatma
Community Manager
Community Manager

Hi @DBEnthusiastIn a Spark cluster, the SparkContext object in your main program (the driver program) connects to a cluster manager, which could be Sparkâs standalone cluster manager, Mesos, YARN, or Kubernetes. This cluster manager allocates resources across applications.

Once connected, Spark acquires executors on nodes in the cluster, processes that run computations and stores data for your application. The SparkContext then sends your application code to the executors and tasks to the executors to run. In Databricks, a similar process occurs.

However, Databricks allows you to create a cluster in advance for interactive clusters, and a cluster is created on the fly for job clusters.

Now, to answer your questions:

1. The memory available after installing the docker image would be less than the initial memory. However, the exact amount would depend on the specific docker image and other configurations. Without specific details, it's impossible to provide a precise answer.

2. In Databricks, the resource manager is generally a standalone resource manager. 

However, in a general Spark setup, Spark is agnostic to the underlying cluster manager and can work with standalone, Mesos, YARN, or Kubernetes.

View solution in original post

2 REPLIES 2

Kaniz_Fatma
Community Manager
Community Manager

Hi @DBEnthusiastIn a Spark cluster, the SparkContext object in your main program (the driver program) connects to a cluster manager, which could be Sparkâs standalone cluster manager, Mesos, YARN, or Kubernetes. This cluster manager allocates resources across applications.

Once connected, Spark acquires executors on nodes in the cluster, processes that run computations and stores data for your application. The SparkContext then sends your application code to the executors and tasks to the executors to run. In Databricks, a similar process occurs.

However, Databricks allows you to create a cluster in advance for interactive clusters, and a cluster is created on the fly for job clusters.

Now, to answer your questions:

1. The memory available after installing the docker image would be less than the initial memory. However, the exact amount would depend on the specific docker image and other configurations. Without specific details, it's impossible to provide a precise answer.

2. In Databricks, the resource manager is generally a standalone resource manager. 

However, in a general Spark setup, Spark is agnostic to the underlying cluster manager and can work with standalone, Mesos, YARN, or Kubernetes.

Hi @Kaniz_Fatma ,

Thanks for your last response

As per my understanding when a user submits an application in spark cluster it specifies how much memory, executors etc it would need . 

But in Databricks notebooks we never specify that anywhere. If we have submitted the notebook in a Job cluster how does DataBricks Resource Manager decides how much it will  allocate resources to this one 

In a cluster having pool I understand we have idle resources which can be allocated as a cluster but still don't understand how much on notebook will be assigned resources

 

 

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group