Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-21-2021 11:54 PM
That I cannot do, there is no single ideal size/scenario.
However: the latest databricks version is a good choice (10.0 or latest LTS for production jobs).
For data jobs, the write optimized nodes are a good choice as they can use delta cache.
For online querying: databricks sql.
I myself use the cheapest node type which handles the job, and that depends on which spark program I run. So I use multiple cluster configurations.
I even run upsert jobs with a single worker on a table of over 300 million records, works fine depending on the amount of data which has to be rewritten.
It depends on filters, transformations etc on these 300 million records.