- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-01-2023 04:23 AM
Is any better way to load huge zipped CSV file to hive_metastore table ?????
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-01-2023 06:03 AM
Hi @Ankit Gangwal , The problem with the zip files is that they are not splittable and only use one core to process. It is better to change the compression format to snappy as it is splittable and will allow spark to distribute the workload over the cluster.
Ref link:- https://www.linkedin.com/pulse/apache-spark-optimizations-compression-deepak-rajak
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-01-2023 05:41 AM
@Ankit Gangwal
Scale up your cluster
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-01-2023 06:03 AM
Hi @Ankit Gangwal , The problem with the zip files is that they are not splittable and only use one core to process. It is better to change the compression format to snappy as it is splittable and will allow spark to distribute the workload over the cluster.
Ref link:- https://www.linkedin.com/pulse/apache-spark-optimizations-compression-deepak-rajak

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-16-2023 09:53 PM
Hi @Ankit Gangwal
Thank you for posting your question in our community! We are happy to assist you.
To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?
This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance!

