Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-27-2024 07:13 AM - edited 06-27-2024 07:16 AM
Hi @lprevost,
I don't think there's a way to "checkpoint partitions" as you said.
For the gzip files, probably your executor is running out of memory during the decompression process. One of the few solutions that doesn't require changing your source files would be to increase the executors memory.
To enable Gzip parallel processing, this lib might be of your interest although I don't think it could address any memory issues based on the way the library works: https://github.com/nielsbasjes/splittablegzip
Best regards,
Raphael Balogo
Sr. Technical Solutions Engineer
Databricks
Raphael Balogo
Sr. Technical Solutions Engineer
Databricks