- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-28-2022 03:02 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-28-2022 10:11 PM
Personnaly, once my data processing is optimize, i benchmark different setups to find the one that respect my process time goal for the less dbu. (Sorry for my english)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-29-2022 10:59 PM
For general cluster decision making refer to this article https://docs.microsoft.com/en-gb/azure/databricks/clusters/cluster-config-best-practices
Once you've selected a cluster that makes sense, run it and check your ganglia metrics to see whether you need a compute, memory, or storage optimized cluster and then iterate from there.
To just see if your code works, starting with a small set of data on a single node is best practice.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-11-2022 08:27 AM
Great article. In the future serverless option will make it easier for newbies.
My blog: https://databrickster.medium.com/