Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-23-2022 01:54 PM
Your chart; its nice (compared to other ones which I see); it seems you utilize your cluster correctly.
You can check in Spark UI for data spills - are Spark partitions fit in memory?
Any cluster should be okay if your disk partition size for your tables is around 200 MB (optimal size). Then, of course, you can do benchmarking. Usually, new versions of machines are a bit faster.
Tooptimizationstimzations for tables where you append data is good to use disk partitioning per date or month. Then, do OPTIMIZATION with the WHERE clause to limit it to only new partitions.
My blog: https://databrickster.medium.com/