How to pick the right cluster for your workflow

Aidzillafont — Thu, 27 Jun 2024 11:25:12 GMT

Hi All,

I am attempting to execute a workflow on various job clusters, including general-purpose and memory-optimized clusters. My main bottleneck is that data is being written to disk because I’m running out of RAM. This is due to the large dataset that I need to load into memory.

The size of the dataset is unavoidable, but the computations are straightforward.

How should I go about selecting the appropriate cluster? Is there a useful guide for choosing the right cluster that I could follow going forward?

Thanks,

Aidzillafont

Re: How to pick the right cluster for your workflow

Ravivarma — Thu, 27 Jun 2024 12:49:35 GMT

Hello @Aidzillafont ,

Greetings!

Please find below the document which explains the Compute configuration best practices

Doc: https://docs.databricks.com/en/compute/cluster-config-best-practices.html

I hope this helps you!

Regards,

Ravi

topic How to pick the right cluster for your workflow in Data Engineering

How to pick the right cluster for your workflow

Re: How to pick the right cluster for your workflow