performance issues using shared compute access mode in scala
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-21-2024 07:16 AM
I created on our dev environment a cluster using the shared access mode, for our devs to use (instead of separate single user clusters).
What I notice is that the performance of this cluster is terrible. And I mean really terrible: notebook cells without any action, so just dataframe definitions take minutes to complete. Even though nothing has to be computed (lazy computing in spark).
When I disable shared compute (so change to single user), performance is reasonable again.
Any ideas?
At the moment I am the only user using the cluster, so it can't be the cluster load.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-22-2024 05:20 AM
Thanks for the answer!
It seems that using shared access mode adds overhead. The nodes/driver are not stressed at all (cpu/ram/network).
We use UC only.
The clusters seems configured correctly (using the same cluster in single user mode changes performance drastically).
Calculating a query plan should not take more than 5 minutes imo.
Physically printing the query plan takes about 40 secs in single user mode, but takes over 5 minutes in shared.
And the only thing that has changed is the access mode.
So my tentative conclusion is that shared mode adds a massive overhead.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-10-2024 03:00 AM
I can confirm this behaviour. To run the same job on shared cluster in "USER_ISOLATION" mode with nothing changes between the job definition or source data, the performance drop is significant. So much so that there needs to be a radical change in how we need to process data.

