Databricks Community

__rake · 06-30-2015

Try this (in 1.4.0): val blockSize = 1024 * 1024 * 16 // 16MB sc.hadoopConfiguration.setInt( "dfs.blocksize", blockSize ) sc.hadoopConfiguration.setInt( "parquet.block.size", blockSize ) Where sc is your SparkContext (not SQLContext). Not that ...

__rake · 06-25-2015

I think I'm experiencing something similar. Not using S3 yet. But reading Parquet tables into DataFrames, trying tactics like persist, coalesce, repartition after reading from Parquet. Using HiveContext, if that matters. But I get the impression tha...

Databricks Community

User Stats

User Activity

Re: How to set size of Parquet output files ?

Re: My Spark SQL join is very slow - what can I do to speed it up?