cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Best practices for compute usage

juanjomendez96
New Contributor III

Hello there!

I am writing this open message to know how you guys are using the computes in your work cases.

Currently, in my company, we have multiple compute instances that can be differentiated into two main types:

  1. Clusters with a large instance for batch processing. This type of cluster is typically used to process a large number of files using parallelization with large instances (e.g., m5d.10xlarge).
  2. Clusters with a small instance for each data scientist in the team. This type of cluster is typically used for analysis, so there is no need for a large instance (e.g., m5d.xlarge).

For the first type, we have a clear understanding. One large compute instance for batch processing is sufficient for our current needs.

For the second type, we are unsure which option is better:

  • To have small compute instances for each data scientist in the team (e.g., m5d.xlarge instance).
  • To have a larger compute instance that is used by all data scientists simultaneously (e.g., m5d.10xlarge instance).

From the second option, we are unsure how to allocate the compute instance among the number of data scientists in the team so that each one receives the same number of cores and RAM, ensuring that all have the same computational power. This is why we have decided to have one small compute instance for each data scientist in the team, but we are not certain which are the best practices in this situation.

Therefore, we would appreciate your recommendation on which of these two options is most suitable. Perhaps there are alternative solutions that we have overlooked.

Thank you in advance for your time team!

1 ACCEPTED SOLUTION

Accepted Solutions

radothede
Valued Contributor II

Hello @juanjomendez96 ,

to my best knowledge and experience autoscaled shared cluster (using smaller instances) works good for most 2nd-case scenario (clusters for ad-hoc/development team usage).

This approach allows You to reuse the resources across team members. Autoscale will spin up additional resources if needed, and deallocate them when the workload drops. This approach will ensure you are cost efficient (reusing the same resources, no spin-up time for each personal cluster) and flexible with small to medium workloads.

On the other hand, there are some drowbacks and limitations as resource contention (heavy workloads of one team member can "consume" compute power of shared cluster, impacting other team members workloads), autoscailing latency (cold stard of new nodes), less predictable runtime performance compared to a fixed-size cluster.

Best,

Radek.

View solution in original post

2 REPLIES 2

radothede
Valued Contributor II

Hello @juanjomendez96 ,

to my best knowledge and experience autoscaled shared cluster (using smaller instances) works good for most 2nd-case scenario (clusters for ad-hoc/development team usage).

This approach allows You to reuse the resources across team members. Autoscale will spin up additional resources if needed, and deallocate them when the workload drops. This approach will ensure you are cost efficient (reusing the same resources, no spin-up time for each personal cluster) and flexible with small to medium workloads.

On the other hand, there are some drowbacks and limitations as resource contention (heavy workloads of one team member can "consume" compute power of shared cluster, impacting other team members workloads), autoscailing latency (cold stard of new nodes), less predictable runtime performance compared to a fixed-size cluster.

Best,

Radek.

Hello @radothede ,

First of all, thanks for the fast response, I really appreciate it.

Secondly, what you explained makes a lot of sense. We did not see the option of autoscaling, thanks for this heads up.

I will talk with the team and see how we can manage this.

Thanks a lot!