cancel
Showing results for 
Search instead for 
Did you mean: 
Warehousing & Analytics
Engage in discussions on data warehousing, analytics, and BI solutions within the Databricks Community. Share insights, tips, and best practices for leveraging data for informed decision-making.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

User16826994223
by Honored Contributor III
  • 1393 Views
  • 1 replies
  • 0 kudos
  • 1393 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

Spark SQL was designed with an optimizer called Catalyst based on the functional programming of Scala. Its two main purposes are: first, to add new optimization techniques to solve some problems with “big data” and second, to allow developers to expa...

  • 0 kudos
Srikanth_Gupta_
by Databricks Employee
  • 2179 Views
  • 1 replies
  • 1 kudos

Reading bulk CSV files from Spark

While trying to read 100GB of csv.gz file from Spark which is taking forever to read, what are best options to read this file faster?

  • 2179 Views
  • 1 replies
  • 1 kudos
Latest Reply
sean_owen
Databricks Employee
  • 1 kudos

Part of the problem here is that .gz files are not splittable. If you have 1 huge 100GB .gz file, it can only be processed by one task. Can you change your input to use a splittable compression like .bz2? it'll work much better.

  • 1 kudos