Databricks Community

Ednexllc · ‎09-10-2025

Hey all, I’m setting up a Databricks workspace to run data pipelines + ML training. I’m unsure which node types to pick (driver vs worker, instance size, memory vs CPU optimized) for different workloads. For example small ETL jobs vs large batch processing, training medium‐sized ML models vs inference workloads

Can someone share how to decide on node sizing / type based on workload pattern? What are trade-offs (cost, performance, scalability)?

szymon_dybczak · ‎09-10-2025

Hi @Ednexllc ,

It really depends on many factor like workload type, size of your data, number of tables etc. You can check some recommendations given by Databricks here:

Compute configuration recommendations | Databricks on AWS

And also here - you should find useful info in section called Databricks Cluster Configuration and Tuning

Comprehensive Guide to Optimize Data Workloads | Databricks

But for me, it’s always a process that requires some trial and error at the beginning. I try different settings, and in the end I choose the ones that handled the given workload best.

So, to put it simply - there's no silver bullet. You can use some guidelines, but in the end you need to test and align compute to your workload/envirionment yourself

Databricks Community

How to choose the right node type for my Databricks workload?

Join Us as a Local Community Builder!

Lakehouse, Lagers & Legends — Bangalore Meetup | December 13

🌟 Community Pulse: Your Weekly Roundup! November 21 – 27, 2025

Join us for another BrickTalk: Vibe-Coding Databricks Apps in Replit with Augusto!

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐