cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to choose the right node type for my Databricks workload?

Ednexllc
New Contributor

Hey all, I’m setting up a Databricks workspace to run data pipelines + ML training. I’m unsure which node types to pick (driver vs worker, instance size, memory vs CPU optimized) for different workloads. For example small ETL jobs vs large batch processing, training medium‐sized ML models vs inference workloads

Can someone share how to decide on node sizing / type based on workload pattern? What are trade-offs (cost, performance, scalability)?

1 REPLY 1

szymon_dybczak
Esteemed Contributor III

Hi @Ednexllc ,

It really depends on many factor like workload type, size of your data, number of tables etc. You can check some recommendations given by Databricks here:

Compute configuration recommendations | Databricks on AWS

And also here - you should find useful info in section called Databricks Cluster Configuration and Tuning

Comprehensive Guide to Optimize Data Workloads | Databricks


But for me, it’s always a process that requires some trial and error at the beginning. I try different settings, and in the end I choose the ones that handled the given workload best.

So, to put it simply - there's no silver bullet. You can use some guidelines, but in the end you need to test and align compute to your workload/envirionment yourself

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now