Databricks Community

DBEnthusiast · ‎09-21-2023

Hi All,

I am curious to know the difference between a spark cluster and a DataBricks one.

As per the info I have read Spark Cluster creates driver and Workers when the Application is submitted whereas in Databricks we can create cluster in advance in case of interactive cluster and a cluster is created on the fly for Job cluster

I need to understand what resides inside a worker. As per documentation workers have docker image which has all necessary stuff needed to run a worker but I still have some questions

1. How much is the memory available after docker image is installed . It would definitely be less than the memory available initially as DS3V2 will not have 14GB or close to that

2. What is the Resource Manager in Data bricks ? Seems like its Standalone Resource Manager . Can we change that to YARN or MESOS ?

Kaniz_Fatma · ‎09-22-2023

Hi @DBEnthusiast, In a Spark cluster, the SparkContext object in your main program (the driver program) connects to a cluster manager, which could be Sparkâs standalone cluster manager, Mesos, YARN, or Kubernetes. This cluster manager allocates resources across applications.

Once connected, Spark acquires executors on nodes in the cluster, processes that run computations and stores data for your application. The SparkContext then sends your application code to the executors and tasks to the executors to run. In Databricks, a similar process occurs.

However, Databricks allows you to create a cluster in advance for interactive clusters, and a cluster is created on the fly for job clusters.

Now, to answer your questions:

1. The memory available after installing the docker image would be less than the initial memory. However, the exact amount would depend on the specific docker image and other configurations. Without specific details, it's impossible to provide a precise answer.

2. In Databricks, the resource manager is generally a standalone resource manager.

However, in a general Spark setup, Spark is agnostic to the underlying cluster manager and can work with standalone, Mesos, YARN, or Kubernetes.

View solution in original post

Kaniz_Fatma · ‎09-22-2023

Hi @DBEnthusiast, In a Spark cluster, the SparkContext object in your main program (the driver program) connects to a cluster manager, which could be Sparkâs standalone cluster manager, Mesos, YARN, or Kubernetes. This cluster manager allocates resources across applications.

Once connected, Spark acquires executors on nodes in the cluster, processes that run computations and stores data for your application. The SparkContext then sends your application code to the executors and tasks to the executors to run. In Databricks, a similar process occurs.

However, Databricks allows you to create a cluster in advance for interactive clusters, and a cluster is created on the fly for job clusters.

Now, to answer your questions:

1. The memory available after installing the docker image would be less than the initial memory. However, the exact amount would depend on the specific docker image and other configurations. Without specific details, it's impossible to provide a precise answer.

2. In Databricks, the resource manager is generally a standalone resource manager.

However, in a general Spark setup, Spark is agnostic to the underlying cluster manager and can work with standalone, Mesos, YARN, or Kubernetes.

DBEnthusiast · ‎09-23-2023

Hi @Kaniz_Fatma ,

Thanks for your last response

As per my understanding when a user submits an application in spark cluster it specifies how much memory, executors etc it would need .

But in Databricks notebooks we never specify that anywhere. If we have submitted the notebook in a Job cluster how does DataBricks Resource Manager decides how much it will allocate resources to this one

In a cluster having pool I understand we have idle resources which can be allocated as a cluster but still don't understand how much on notebook will be assigned resources

Databricks Community

DataBricks Cluster

Connect with Databricks Users in Your Area

Live Event South Florida Databricks User Group: Accelerate Projects to Value with GenAI

Databricks Learning Festival (Virtual): 10 October - 31 October

Next-Level Interactivity in AI/BI Dashboards

Submit your feedback and win a $25 gift card!

Databricks Unity Catalog Workshop