Databricks Community

aladda · ‎06-23-2021

aladda · ‎06-23-2021

gzip format is not splittable so the load process is sequential and thus slower. You can either try to split the CSV up into parts, gzip those separately and load them. Alternatively bzip is a splittable zip format that is better to work with

Or you can also try Pigz alternative to gzip - https://ostechnix.com/pigz-compress-and-decompress-files-in-parallel-in-linux/

View solution in original post

aladda · ‎06-23-2021

gzip format is not splittable so the load process is sequential and thus slower. You can either try to split the CSV up into parts, gzip those separately and load them. Alternatively bzip is a splittable zip format that is better to work with

Or you can also try Pigz alternative to gzip - https://ostechnix.com/pigz-compress-and-decompress-files-in-parallel-in-linux/

Databricks Community

How can I speed up the loading of a large zipped CSV file in databricks

DAIS 2026 Speaker Spotlight Series #6 | Surya Sai Turaga

🌟 Community Pulse: Your Weekly Roundup! May 11 – 17, 2026

Databricks Community Champion - May 2026 - Balaji J

Solution Accelerator Series | Media Mix Modeling (MMM)

DAIS 2026 | Community Virtual Contest – Showcase Your Skills & Win Exclusive Swag