Warehousing & Analytics

by User16826994223 • Honored Contributor III

06-25-2021 9:07:59 AM

1393 Views
1 replies
0 kudos

Resolved! what is catalyst in spark

Warehousing & Analytics

1393 Views
1 replies
0 kudos

06-25-2021 9:07:59 AM

View Replies

Latest Reply

User16826994223
Honored Contributor III

06-25-2021 9:08:16 AM

0 kudos

Spark SQL was designed with an optimizer called Catalyst based on the functional programming of Scala. Its two main purposes are: first, to add new optimization techniques to solve some problems with “big data” and second, to allow developers to expa...

0 kudos

06-25-2021 9:08:16 AM

by Srikanth_Gupta_ • Databricks Employee

06-16-2021 5:39:16 AM

2179 Views
1 replies
1 kudos

Reading bulk CSV files from Spark

While trying to read 100GB of csv.gz file from Spark which is taking forever to read, what are best options to read this file faster?

Warehousing & Analytics

2179 Views
1 replies
1 kudos

06-16-2021 5:39:16 AM

View Replies

Latest Reply

sean_owen
Databricks Employee

06-17-2021 4:06:32 PM

1 kudos

Part of the problem here is that .gz files are not splittable. If you have 1 huge 100GB .gz file, it can only be processed by one task. Can you change your input to use a splittable compression like .bz2? it'll work much better.

1 kudos

06-17-2021 4:06:32 PM

Databricks Community

Forum Posts

Resolved! what is catalyst in spark

Reading bulk CSV files from Spark