Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-01-2023 06:03 AM
Hi @Ankit Gangwal , The problem with the zip files is that they are not splittable and only use one core to process. It is better to change the compression format to snappy as it is splittable and will allow spark to distribute the workload over the cluster.
Ref link:- https://www.linkedin.com/pulse/apache-spark-optimizations-compression-deepak-rajak