Reading bulk CSV files from Spark

Srikanth_Gupta_ — Thu, 20 Mar 2025 17:19:55 GMT

While trying to read 100GB of csv.gz file from Spark which is taking forever to read, what are best options to read this file faster?

Re: Reading bulk CSV files from Spark

sean_owen — Thu, 17 Jun 2021 23:06:32 GMT

Part of the problem here is that .gz files are not splittable. If you have 1 huge 100GB .gz file, it can only be processed by one task. Can you change your input to use a splittable compression like .bz2? it'll work much better.

topic Reading bulk CSV files from Spark in Warehousing & Analytics

Reading bulk CSV files from Spark

Re: Reading bulk CSV files from Spark