Databricks Community

Avinash_Narala · ‎01-07-2025

Hi,

Recently, I had some logic to collect the dataframe and process row by row. I am using 128GB driver node but it is taking significantly more time (like 2 hours for just 700 rows of data).

May I know which type of cluster should I use and the driver size?

Ayushi_Suthar · ‎01-08-2025

Hi @Avinash_Narala , Good Day!

For right-sizing the cluster, the recommended approach is a hybrid approach for node provisioning in the cluster along with autoscaling. This involves defining the number of on-demand instances and spot instances for the cluster and enabling autoscaling between the minimum and the maximum number of instances. This allows the cluster to scale up and down depending on the load. Also, please refer to the below documents for more information.

Please let me know if this helps and leave a like if this information is useful, followups are appreciated.
Kudos
Ayushi

View solution in original post

Ayushi_Suthar · ‎01-08-2025

Hi @Avinash_Narala , Good Day!

For right-sizing the cluster, the recommended approach is a hybrid approach for node provisioning in the cluster along with autoscaling. This involves defining the number of on-demand instances and spot instances for the cluster and enabling autoscaling between the minimum and the maximum number of instances. This allows the cluster to scale up and down depending on the load. Also, please refer to the below documents for more information.

Please let me know if this helps and leave a like if this information is useful, followups are appreciated.
Kudos
Ayushi

Databricks Community

which type of cluster to use

Join Us as a Local Community Builder!

🚀 Weekly Delta (8 - 14 October): A Look Back at This Week’s Top Community Highlights

Databricks Community Champion - September 2025 - Nayanjyoti Sonowal

🌟 Community Sparks of the Week | September 26 – October 2 🌟

Solution Accelerator Series | #4 - Toxicity Detection for Gaming

Level Up with Databricks Specialist Sessions