โ07-15-2015 11:45 AM
Spark, by default, uses gzip to store parquet files. I would like to change the compression algorithm from gzip to snappy or lz4.
โ07-15-2015 11:46 AM
You can set the following spark sql property spark.sql.parquet.compression.codec.
In sql:
%sql set spark.sql.parquet.compression.codec=snappy
You can also set in the sqlContext directly:
sqlContext.setConf("spark.sql.parquet.compression.codec.", "snappy")
โ05-06-2016 11:06 PM
Note the above has a slight typo
You can also set in the sqlContext directly: sqlContext.setConf("spark.sql.parquet.compression.codec", "snappy")
Unfortunately it appears that lz4 isnt supported as a parquet compression codec. Im not sure why as lz4 is supported for io.codec.
โ07-28-2016 02:01 PM
What are the options if I don't need any compression while writing my dataframe to HDFS as parquet format ?
โ06-09-2017 09:26 AM
sqlContext.setConf("spark.sql.parquet.compression.codec", "uncompressed")
โ07-28-2016 03:34 PM
@karthik.thatiโ - Try this
df.write.option("compression","none").mode("overwrite").save("testoutput.parquet")
โ06-09-2017 09:44 AM
For uncompressed use
sqlContext.setConf("spark.sql.parquet.compression.codec", "uncompressed")
The value highlighted could be one of the four : uncompressed, snappy, gzip, lzo
โ12-31-2017 06:31 AM
@prakash573: I
I guess spark uses "Snappy" compression for parquet file by default. I'm referring Spark's official document "Learning Spark" , Chapter 9, page # 182, Table 9-3.
Please confirm if this is not correct.
Thank You
Venkat Anampudi
โ01-16-2020 02:47 AM
Starting from spark version 2.1.0,"snappy" is the default compression and before that version "gzip" is default compression format in spark.
โ10-01-2019 02:10 AM
spark.sql("set spark.sql.parquet.compression.codec=gzip");
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group