Databricks Community

juanjomendez96 · ‎08-05-2025

Hello there!

I am writing this open message to know how you guys are using the computes in your work cases.

Currently, in my company, we have multiple compute instances that can be differentiated into two main types:

Clusters with a large instance for batch processing. This type of cluster is typically used to process a large number of files using parallelization with large instances (e.g., m5d.10xlarge).
Clusters with a small instance for each data scientist in the team. This type of cluster is typically used for analysis, so there is no need for a large instance (e.g., m5d.xlarge).

For the first type, we have a clear understanding. One large compute instance for batch processing is sufficient for our current needs.

For the second type, we are unsure which option is better:

To have small compute instances for each data scientist in the team (e.g., m5d.xlarge instance).
To have a larger compute instance that is used by all data scientists simultaneously (e.g., m5d.10xlarge instance).

From the second option, we are unsure how to allocate the compute instance among the number of data scientists in the team so that each one receives the same number of cores and RAM, ensuring that all have the same computational power. This is why we have decided to have one small compute instance for each data scientist in the team, but we are not certain which are the best practices in this situation.

Therefore, we would appreciate your recommendation on which of these two options is most suitable. Perhaps there are alternative solutions that we have overlooked.

Thank you in advance for your time team!

radothede · ‎08-05-2025

Hello @juanjomendez96 ,

to my best knowledge and experience autoscaled shared cluster (using smaller instances) works good for most 2nd-case scenario (clusters for ad-hoc/development team usage).

This approach allows You to reuse the resources across team members. Autoscale will spin up additional resources if needed, and deallocate them when the workload drops. This approach will ensure you are cost efficient (reusing the same resources, no spin-up time for each personal cluster) and flexible with small to medium workloads.

On the other hand, there are some drowbacks and limitations as resource contention (heavy workloads of one team member can "consume" compute power of shared cluster, impacting other team members workloads), autoscailing latency (cold stard of new nodes), less predictable runtime performance compared to a fixed-size cluster.

Best,

Radek.

View solution in original post

radothede · ‎08-05-2025

Hello @juanjomendez96 ,

to my best knowledge and experience autoscaled shared cluster (using smaller instances) works good for most 2nd-case scenario (clusters for ad-hoc/development team usage).

This approach allows You to reuse the resources across team members. Autoscale will spin up additional resources if needed, and deallocate them when the workload drops. This approach will ensure you are cost efficient (reusing the same resources, no spin-up time for each personal cluster) and flexible with small to medium workloads.

On the other hand, there are some drowbacks and limitations as resource contention (heavy workloads of one team member can "consume" compute power of shared cluster, impacting other team members workloads), autoscailing latency (cold stard of new nodes), less predictable runtime performance compared to a fixed-size cluster.

Best,

Radek.

juanjomendez96 · ‎08-05-2025

Hello @radothede ,

First of all, thanks for the fast response, I really appreciate it.

Secondly, what you explained makes a lot of sense. We did not see the option of autoscaling, thanks for this heads up.

I will talk with the team and see how we can manage this.

Thanks a lot!

Databricks Community

Best practices for compute usage

Join Us as a Local Community Builder!

Join us for another BrickTalk: Vibe-Coding Databricks Apps in Replit with Augusto!

🌟 Community Pulse: Your Weekly Roundup! November 14 – 20, 2025

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples