Databricks Community

Aidzillafont · ‎06-27-2024

Hi All,

I am attempting to execute a workflow on various job clusters, including general-purpose and memory-optimized clusters. My main bottleneck is that data is being written to disk because I’m running out of RAM. This is due to the large dataset that I need to load into memory.

The size of the dataset is unavoidable, but the computations are straightforward.

How should I go about selecting the appropriate cluster? Is there a useful guide for choosing the right cluster that I could follow going forward?

Thanks,

Aidzillafont

Ravivarma · ‎06-27-2024

Hello @Aidzillafont ,

Greetings!

Please find below the document which explains the Compute configuration best practices

Doc: https://docs.databricks.com/en/compute/cluster-config-best-practices.html

I hope this helps you!

Regards,

Ravi

Databricks Community

How to pick the right cluster for your workflow

Connect with Databricks Users in Your Area

Databricks Named a Leader in the 2024 Gartner® Magic Quadrant™ for Cloud Database Management Systems

Announcing the new Meta Llama 3.3 model on Databricks

Milestone: DatabricksTV Reaches 100 Videos!

Dotmatics and Databricks Partner to Advance Scientific Intelligence in Life Sciences

Databricks Community Champion - December 2024 - Sujesh Menon