Databricks Community

-werners- · ‎02-21-2024

I created on our dev environment a cluster using the shared access mode, for our devs to use (instead of separate single user clusters).

What I notice is that the performance of this cluster is terrible. And I mean really terrible: notebook cells without any action, so just dataframe definitions take minutes to complete. Even though nothing has to be computed (lazy computing in spark).

When I disable shared compute (so change to single user), performance is reasonable again.

Any ideas?
At the moment I am the only user using the cluster, so it can't be the cluster load.

-werners- · ‎02-22-2024

Thanks for the answer!

It seems that using shared access mode adds overhead. The nodes/driver are not stressed at all (cpu/ram/network).
We use UC only.
The clusters seems configured correctly (using the same cluster in single user mode changes performance drastically).
Calculating a query plan should not take more than 5 minutes imo.
Physically printing the query plan takes about 40 secs in single user mode, but takes over 5 minutes in shared.
And the only thing that has changed is the access mode.
So my tentative conclusion is that shared mode adds a massive overhead.

prakharcode · ‎09-10-2024

I can confirm this behaviour. To run the same job on shared cluster in "USER_ISOLATION" mode with nothing changes between the job definition or source data, the performance drop is significant. So much so that there needs to be a radical change in how we need to process data.

Databricks Community

performance issues using shared compute access mode in scala

Photos

Join Us as a Local Community Builder!

Announcing the APJ Databricks Smart Business Insights Challenge: Empowering Data-Driven Decision Mak

🚀 Monthly Databricks Get Started Days – Accelerate Your Learning Journey! 🚀

Business Intelligence in the Era of AI

Virtual Learning Festival: 9 April - 30 April

Data + AI Summit 2025 — registration now open!