Hi
I'm using Parquet for format to store Raw Data. Actually the part file are stored on S3
I would like to control the file size of each parquet part file.
I try this
sqlContext.setConf("spark.parquet.block.size", SIZE.toString)
sqlContext.setCon...
I'm making join between Parquet DB stored on S3
but it's seems that anyway Spark try to read all the data as we not see better performance when changing the queries.
I need to continue to investigate this point because it's not yet clear.