Databricks Community

DBEnthusiast · ‎09-21-2023

Hi All,

I am curious to know the difference between a spark cluster and a DataBricks one.

As per the info I have read Spark Cluster creates driver and Workers when the Application is submitted whereas in Databricks we can create cluster in advance in case of interactive cluster and a cluster is created on the fly for Job cluster

I need to understand what resides inside a worker. As per documentation workers have docker image which has all necessary stuff needed to run a worker but I still have some questions

1. How much is the memory available after docker image is installed . It would definitely be less than the memory available initially as DS3V2 will not have 14GB or close to that

2. What is the Resource Manager in Data bricks ? Seems like its Standalone Resource Manager . Can we change that to YARN or MESOS ?

DBEnthusiast · ‎09-23-2023

Hi @Retired_mod ,

Thanks for your last response

As per my understanding when a user submits an application in spark cluster it specifies how much memory, executors etc it would need .

But in Databricks notebooks we never specify that anywhere. If we have submitted the notebook in a Job cluster how does DataBricks Resource Manager decides how much it will allocate resources to this one

In a cluster having pool I understand we have idle resources which can be allocated as a cluster but still don't understand how much on notebook will be assigned resources

Databricks Community

DataBricks Cluster

Photos

Connect with Databricks Users in Your Area

Virtual Learning Festival: 9 April - 30 April

Get Started With Lakehouse Architecture | Pass a quiz to earn your certificate completion.

Data + AI Summit 2025 — registration now open!

Databricks DevConnect: Global Community Meetups for Data Engineers

Databricks Community Champion - February 2025 - Stefan Koch