cancel
Showing results for 
Search instead for 
Did you mean: 
Warehousing & Analytics
Engage in discussions on data warehousing, analytics, and BI solutions within the Databricks Community. Share insights, tips, and best practices for leveraging data for informed decision-making.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Srikanth_Gupta_
by Valued Contributor
  • 1912 Views
  • 1 replies
  • 1 kudos

Reading bulk CSV files from Spark

While trying to read 100GB of csv.gz file from Spark which is taking forever to read, what are best options to read this file faster?

  • 1912 Views
  • 1 replies
  • 1 kudos
Latest Reply
sean_owen
Honored Contributor II
  • 1 kudos

Part of the problem here is that .gz files are not splittable. If you have 1 huge 100GB .gz file, it can only be processed by one task. Can you change your input to use a splittable compression like .bz2? it'll work much better.

  • 1 kudos
Labels