Databricks Community

richard1_558848 · 05-19-2015

Hi I'm using Parquet for format to store Raw Data. Actually the part file are stored on S3 I would like to control the file size of each parquet part file. I try this sqlContext.setConf("spark.parquet.block.size", SIZE.toString) sqlContext.setCon...

richard1_558848 · 06-21-2015

I'm making join between Parquet DB stored on S3 but it's seems that anyway Spark try to read all the data as we not see better performance when changing the queries. I need to continue to investigate this point because it's not yet clear.

richard1_558848 · 05-20-2015

Any Information ?

richard1_558848 · 05-20-2015

How do you set this size ?

Databricks Community

User Stats

User Activity

How to set size of Parquet output files ?

Re: My Spark SQL join is very slow - what can I do to speed it up?

Re: How to set size of Parquet output files ?

Re: What is an optimal size for file partitions using Parquet?