Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-03-2023 04:32 AM
There are few options I tried out which had given me a better performance.
- Caching the intermediate or final results so that while writing the dataframe computation does not repeat again.
- Coalesce the results into the partitions 1x or 0.5x your number of cores and also ensure that your partitions are equal to or more than 128 MB blocks so that the writes are happening efficiently.