cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

A Standard cluster is recommended for a single user - what is meant by that ?

Tahseen0354
Valued Contributor

Hi, I have seen it written in the documentation that standard cluster is recommended for a single user. But why ? What is meant by that ? Me and one of my colleagues were testing it on the same notebook. Both of us can use the same standard all purpose cluster in the same notebook at the same time. It is just that we could not execute the same cell at the same time but that is reasonably normal.

But if two persons can use the same standard all purpose cluster in the same notebook at the same time, then why it is recommended for single user ? Does that mean that we should select high concurrency cluster when multiple people are collaborating in the same notebook at the same time for simple data read and write experiments ?

1 ACCEPTED SOLUTION

Accepted Solutions

Hubert-Dudek
Esteemed Contributor III

High concurrency cluster just split resource between users more evenly. So when 4 people run notebooks in the same time on cluster with 4 cpu you can imagine that every will get 1 cpu.

In standard cluster 1 person could utilize all worker cpus as your job have multiple partitions (for example 4) so will require multiple cores (1 cpu process 1 partition at a time so all 4 cpus will be busy processing 4 partitions) so other users' jobs will wait in queue till your job is finished.

In standard cluster you can also maintain resource allocations on notebook level using pools. To do that set sparkContext property in first line of notebook:

spark.sparkContext.setLocalProperty("spark.scheduler.pool", "pool1")

View solution in original post

4 REPLIES 4

Atanu
Databricks Employee
Databricks Employee
  • Standard clusters are ideal for processing large amounts of data with Apache Spark. We recommand to use standard cluster for a single user because it meant to be handle less load compare to high con cluster.
  • High Concurrency clusters are ideal for groups of users who need to share resources or run ad-hoc jobs. Administrators usually create High Concurrency clusters.The key benefits of High Concurrency clusters are that they provide Apache Spark-native fine-grained sharing for maximum resource utilization and minimum query latencies.
  • you can go through this best practice which may help - https://docs.databricks.com/clusters/cluster-config-best-practices.html

Tahseen0354
Valued Contributor

Thank you so much for your reply. So I think it is more related to how the load is handled, not how many users are using the cluster.

Hubert-Dudek
Esteemed Contributor III

High concurrency cluster just split resource between users more evenly. So when 4 people run notebooks in the same time on cluster with 4 cpu you can imagine that every will get 1 cpu.

In standard cluster 1 person could utilize all worker cpus as your job have multiple partitions (for example 4) so will require multiple cores (1 cpu process 1 partition at a time so all 4 cpus will be busy processing 4 partitions) so other users' jobs will wait in queue till your job is finished.

In standard cluster you can also maintain resource allocations on notebook level using pools. To do that set sparkContext property in first line of notebook:

spark.sparkContext.setLocalProperty("spark.scheduler.pool", "pool1")

Thank you so much for your reply. Now it makes more sense.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group