cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Understanding Serverless Compute Sharing Across Notebooks in Databricks

Akshay_Petkar
Contributor III

Hi Community,

I am using Databricks Serverless compute in notebooks. When I create multiple notebooks and choose Serverless as the compute, I noticed that I can select the same serverless cluster for all of them.

This brings up a few questions:

  1. Is this serverless compute shared across all notebooks (and users), or does each notebook/user get a separate compute instance behind the scenes?

  2. If it is shared, is there a max limit on CPU cores or concurrency for the serverless compute engine?

  3. How does Databricks handle auto-scaling or resource isolation when multiple users are running queries using the same serverless compute?

Thanks in advance for the clarification!

Akshay Petkar
1 ACCEPTED SOLUTION

Accepted Solutions

BigRoux
Databricks Employee
Databricks Employee
Databricks Serverless compute operates with shared infrastructure, enabling multiple notebooks and users to utilize the same serverless cluster while maintaining isolation. This is achieved through features like client, driver, and executor isolation, ensuring workload security and preventing interference among users.
Key characteristics and behaviors include:
  1. Shared Compute for Users and Notebooks: Serverless compute allows for secure sharing across users and notebooks on the same cluster while leveraging identity management and sandboxing techniques to ensure isolation.
  2. Scaling and Concurrency:
    • By default, there is no concurrency limit set for serverless environments, allowing high-concurrency operations. However, resource caps and autoscaling policies can be applied to control costs and optimize performance.
    • Horizontal autoscaling dynamically adjusts resources based on workload requirements, transitioning quickly between scaling up during peak loads and scaling down during low usage to conserve resources and minimize idle costs.
  3. Resource Limits and Auto-Scaling:
    • Serverless compute employs an advanced autoscaler that cannot be disabled. It scales resources intelligently by leveraging workload patterns and pre-provisioning warm pools of instances for fast startup times. However, per-hour cost scaling limits are imposed to avoid runaway expenses, and higher resource caps can be requested as needed.
  4. Resource Isolation: Even in shared environments, workloads enjoy resource isolation. This ensures independent execution of tasks without resource bottlenecking, particularly valuable for multi-task jobs or streaming pipelines, enhancing both performance and cost-efficiency.
  5. Billing: Users are billed for actual workload activity rather than idle resources. Databricks manages all infrastructure intricaciesโ€”including instance type selection and scalingโ€”resulting in simpler operations for users.
These features together ensure that Databricks Serverless compute is cost-efficient, scalable, and secure for multi-user and multi-notebook usage, while providing robust resource handling through intelligent autoscaling and workload isolation.
 
Hope this helps, Lou.

View solution in original post

3 REPLIES 3

BigRoux
Databricks Employee
Databricks Employee
Databricks Serverless compute operates with shared infrastructure, enabling multiple notebooks and users to utilize the same serverless cluster while maintaining isolation. This is achieved through features like client, driver, and executor isolation, ensuring workload security and preventing interference among users.
Key characteristics and behaviors include:
  1. Shared Compute for Users and Notebooks: Serverless compute allows for secure sharing across users and notebooks on the same cluster while leveraging identity management and sandboxing techniques to ensure isolation.
  2. Scaling and Concurrency:
    • By default, there is no concurrency limit set for serverless environments, allowing high-concurrency operations. However, resource caps and autoscaling policies can be applied to control costs and optimize performance.
    • Horizontal autoscaling dynamically adjusts resources based on workload requirements, transitioning quickly between scaling up during peak loads and scaling down during low usage to conserve resources and minimize idle costs.
  3. Resource Limits and Auto-Scaling:
    • Serverless compute employs an advanced autoscaler that cannot be disabled. It scales resources intelligently by leveraging workload patterns and pre-provisioning warm pools of instances for fast startup times. However, per-hour cost scaling limits are imposed to avoid runaway expenses, and higher resource caps can be requested as needed.
  4. Resource Isolation: Even in shared environments, workloads enjoy resource isolation. This ensures independent execution of tasks without resource bottlenecking, particularly valuable for multi-task jobs or streaming pipelines, enhancing both performance and cost-efficiency.
  5. Billing: Users are billed for actual workload activity rather than idle resources. Databricks manages all infrastructure intricaciesโ€”including instance type selection and scalingโ€”resulting in simpler operations for users.
These features together ensure that Databricks Serverless compute is cost-efficient, scalable, and secure for multi-user and multi-notebook usage, while providing robust resource handling through intelligent autoscaling and workload isolation.
 
Hope this helps, Lou.

Hi @BigRoux ,
I am using serverless compute for running a hash validation script across a large number of tables. While serverless is supposed to automatically adjust resources based on workload scaling up during peak and scaling down during idle periods I am noticing that the driver gets detached automatically, especially when processing large tables.

1. What is the reason the driver detaches automatically in serverless compute, especially when processing large tables?
2. What is the solution or best practice to prevent this issue and ensure stable processing?

BigRoux
Databricks Employee
Databricks Employee

Could you clarify what you mean by โ€œThe driver detachesโ€? If the driver detaches, the cluster would typically fail. Are you using Spark for processing, or is this a pure Python workload? If youโ€™re using pure Python, only the driver node is utilized, since Python doesnโ€™t support distributed execution in this context.

 

Please provide more details and any demonstrable evidence that the driver is being detached.

 

Thanks,

Louis

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now