How do I setup the right cluster?

Kaher
New Contributor
 

Ybaselto
New Contributor II

Personnaly, once my data processing is optimize, i benchmark different set​ups to find the one that respect my process time goal for the less dbu. (Sorry for my english)

Rheiman
Contributor II

For general cluster decision making refer to this article https://docs.microsoft.com/en-gb/azure/databricks/clusters/cluster-config-best-practices

Once you've selected a cluster that makes sense, run it and check your ganglia metrics to see whether you need a compute, memory, or storage optimized cluster and then iterate from there.

To just see if your code works, starting with a small set of data on a single node is best practice.

View solution in original post

Great article. In the future serverless option will make it easier for newbies.


My blog: https://databrickster.medium.com/