topic Re: Understanding Serverless Compute Sharing Across Notebooks in Databricks in Data Engineering

Understanding Serverless Compute Sharing Across Notebooks in Databricks

Akshay_Petkar — Tue, 03 Jun 2025 04:33:54 GMT

Hi Community,

I am using Databricks Serverless compute in notebooks. When I create multiple notebooks and choose Serverless as the compute, I noticed that I can select the same serverless cluster for all of them.

This brings up a few questions:

Is this serverless compute shared across all notebooks (and users), or does each notebook/user get a separate compute instance behind the scenes?
If it is shared, is there a max limit on CPU cores or concurrency for the serverless compute engine?
How does Databricks handle auto-scaling or resource isolation when multiple users are running queries using the same serverless compute?

Thanks in advance for the clarification!

Re: Understanding Serverless Compute Sharing Across Notebooks in Databricks

Louis_Frolio — Tue, 03 Jun 2025 19:55:30 GMT

Databricks Serverless compute operates with shared infrastructure, enabling multiple notebooks and users to utilize the same serverless cluster while maintaining isolation. This is achieved through features like client, driver, and executor isolation, ensuring workload security and preventing interference among users.

Key characteristics and behaviors include:

Shared Compute for Users and Notebooks: Serverless compute allows for secure sharing across users and notebooks on the same cluster while leveraging identity management and sandboxing techniques to ensure isolation.
Scaling and Concurrency:
- By default, there is no concurrency limit set for serverless environments, allowing high-concurrency operations. However, resource caps and autoscaling policies can be applied to control costs and optimize performance.
- Horizontal autoscaling dynamically adjusts resources based on workload requirements, transitioning quickly between scaling up during peak loads and scaling down during low usage to conserve resources and minimize idle costs.
Resource Limits and Auto-Scaling:
- Serverless compute employs an advanced autoscaler that cannot be disabled. It scales resources intelligently by leveraging workload patterns and pre-provisioning warm pools of instances for fast startup times. However, per-hour cost scaling limits are imposed to avoid runaway expenses, and higher resource caps can be requested as needed.
Resource Isolation: Even in shared environments, workloads enjoy resource isolation. This ensures independent execution of tasks without resource bottlenecking, particularly valuable for multi-task jobs or streaming pipelines, enhancing both performance and cost-efficiency.
Billing: Users are billed for actual workload activity rather than idle resources. Databricks manages all infrastructure intricacies—including instance type selection and scaling—resulting in simpler operations for users.

These features together ensure that Databricks Serverless compute is cost-efficient, scalable, and secure for multi-user and multi-notebook usage, while providing robust resource handling through intelligent autoscaling and workload isolation.

Hope this helps, Lou.

Re: Understanding Serverless Compute Sharing Across Notebooks in Databricks

vidya_kothavale — Wed, 04 Jun 2025 04:11:59 GMT

Hi @Louis_Frolio ,
I am using serverless compute for running a hash validation script across a large number of tables. While serverless is supposed to automatically adjust resources based on workload scaling up during peak and scaling down during idle periods I am noticing that the driver gets detached automatically, especially when processing large tables.

1. What is the reason the driver detaches automatically in serverless compute, especially when processing large tables?
2. What is the solution or best practice to prevent this issue and ensure stable processing?

Re: Understanding Serverless Compute Sharing Across Notebooks in Databricks

Louis_Frolio — Wed, 04 Jun 2025 11:19:14 GMT

Could you clarify what you mean by “The driver detaches”? If the driver detaches, the cluster would typically fail. Are you using Spark for processing, or is this a pure Python workload? If you’re using pure Python, only the driver node is utilized, since Python doesn’t support distributed execution in this context.

Please provide more details and any demonstrable evidence that the driver is being detached.

Thanks,

Louis