which type of cluster to use

Avinash_Narala — Wed, 08 Jan 2025 07:24:47 GMT

Hi,

Recently, I had some logic to collect the dataframe and process row by row. I am using 128GB driver node but it is taking significantly more time (like 2 hours for just 700 rows of data).

May I know which type of cluster should I use and the driver size?

Re: which type of cluster to use

Ayushi_Suthar — Wed, 08 Jan 2025 08:35:24 GMT

Hi @Avinash_Narala , Good Day!

For right-sizing the cluster, the recommended approach is a hybrid approach for node provisioning in the cluster along with autoscaling. This involves defining the number of on-demand instances and spot instances for the cluster and enabling autoscaling between the minimum and the maximum number of instances. This allows the cluster to scale up and down depending on the load. Also, please refer to the below documents for more information.

Please let me know if this helps and leave a like if this information is useful, followups are appreciated.
Kudos
Ayushi

topic Re: which type of cluster to use in Data Engineering

which type of cluster to use

Re: which type of cluster to use