<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How can I change the parquet compression algorithm from gzip to something else? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-can-i-change-the-parquet-compression-algorithm-from-gzip-to/m-p/30397#M22035</link>
    <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;What are the options if I don't need any compression while writing my dataframe to HDFS as parquet format ?&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 28 Jul 2016 21:01:24 GMT</pubDate>
    <dc:creator>karthik_thati</dc:creator>
    <dc:date>2016-07-28T21:01:24Z</dc:date>
    <item>
      <title>How can I change the parquet compression algorithm from gzip to something else?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-can-i-change-the-parquet-compression-algorithm-from-gzip-to/m-p/30394#M22032</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Spark, by default, uses gzip to store parquet files. I would like to change the compression algorithm from gzip to snappy or lz4.&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 15 Jul 2015 18:45:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-can-i-change-the-parquet-compression-algorithm-from-gzip-to/m-p/30394#M22032</guid>
      <dc:creator>User16301467532</dc:creator>
      <dc:date>2015-07-15T18:45:24Z</dc:date>
    </item>
    <item>
      <title>Re: How can I change the parquet compression algorithm from gzip to something else?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-can-i-change-the-parquet-compression-algorithm-from-gzip-to/m-p/30395#M22033</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;You can set the following spark sql property spark.sql.parquet.compression.codec.&lt;/P&gt;
&lt;P&gt;In sql:&lt;/P&gt;
&lt;P&gt;%sql set spark.sql.parquet.compression.codec=snappy&lt;/P&gt;
&lt;P&gt;You can also set in the sqlContext directly:&lt;/P&gt;
&lt;P&gt;sqlContext.setConf("spark.sql.parquet.compression.codec.", "snappy")&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 15 Jul 2015 18:46:35 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-can-i-change-the-parquet-compression-algorithm-from-gzip-to/m-p/30395#M22033</guid>
      <dc:creator>User16301467532</dc:creator>
      <dc:date>2015-07-15T18:46:35Z</dc:date>
    </item>
    <item>
      <title>Re: How can I change the parquet compression algorithm from gzip to something else?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-can-i-change-the-parquet-compression-algorithm-from-gzip-to/m-p/30396#M22034</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Note the above has a slight typo&lt;/P&gt;
&lt;P&gt;You can also set in the sqlContext directly: sqlContext.setConf("spark.sql.parquet.compression.codec", "snappy")&lt;/P&gt;
&lt;P&gt;Unfortunately it appears that lz4 isnt supported as a parquet compression codec. Im not sure why as lz4 is supported for io.codec.&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 07 May 2016 06:06:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-can-i-change-the-parquet-compression-algorithm-from-gzip-to/m-p/30396#M22034</guid>
      <dc:creator>JohnCavanaugh</dc:creator>
      <dc:date>2016-05-07T06:06:30Z</dc:date>
    </item>
    <item>
      <title>Re: How can I change the parquet compression algorithm from gzip to something else?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-can-i-change-the-parquet-compression-algorithm-from-gzip-to/m-p/30397#M22035</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;What are the options if I don't need any compression while writing my dataframe to HDFS as parquet format ?&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 28 Jul 2016 21:01:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-can-i-change-the-parquet-compression-algorithm-from-gzip-to/m-p/30397#M22035</guid>
      <dc:creator>karthik_thati</dc:creator>
      <dc:date>2016-07-28T21:01:24Z</dc:date>
    </item>
    <item>
      <title>Re: How can I change the parquet compression algorithm from gzip to something else?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-can-i-change-the-parquet-compression-algorithm-from-gzip-to/m-p/30398#M22036</link>
      <description>&lt;P&gt;@karthik.thati​&amp;nbsp;- Try this&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;df.write.option("compression","none").mode("overwrite").save("testoutput.parquet")&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 28 Jul 2016 22:34:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-can-i-change-the-parquet-compression-algorithm-from-gzip-to/m-p/30398#M22036</guid>
      <dc:creator>girivaratharaja</dc:creator>
      <dc:date>2016-07-28T22:34:39Z</dc:date>
    </item>
    <item>
      <title>Re: How can I change the parquet compression algorithm from gzip to something else?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-can-i-change-the-parquet-compression-algorithm-from-gzip-to/m-p/30399#M22037</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;sqlContext.setConf("spark.sql.parquet.compression.codec", "uncompressed")&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 09 Jun 2017 16:26:44 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-can-i-change-the-parquet-compression-algorithm-from-gzip-to/m-p/30399#M22037</guid>
      <dc:creator>sujoyDutta</dc:creator>
      <dc:date>2017-06-09T16:26:44Z</dc:date>
    </item>
    <item>
      <title>Re: How can I change the parquet compression algorithm from gzip to something else?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-can-i-change-the-parquet-compression-algorithm-from-gzip-to/m-p/30400#M22038</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;For uncompressed use &lt;/P&gt;
&lt;P&gt;sqlContext.setConf("spark.sql.parquet.compression.codec", "&lt;B&gt;uncompressed&lt;/B&gt;")&lt;/P&gt;
&lt;P&gt;The value highlighted could be one of the four : uncompressed, snappy, gzip, lzo&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 09 Jun 2017 16:44:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-can-i-change-the-parquet-compression-algorithm-from-gzip-to/m-p/30400#M22038</guid>
      <dc:creator>sujoyDutta</dc:creator>
      <dc:date>2017-06-09T16:44:23Z</dc:date>
    </item>
    <item>
      <title>Re: How can I change the parquet compression algorithm from gzip to something else?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-can-i-change-the-parquet-compression-algorithm-from-gzip-to/m-p/30401#M22039</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;@prakash573: I &lt;/P&gt;
&lt;P&gt;I guess spark uses "Snappy" compression for parquet file by default. I'm referring Spark's official document "Learning Spark" , Chapter 9, page # 182, Table 9-3. &lt;/P&gt;
&lt;P&gt;Please confirm if this is not correct. &lt;/P&gt;
&lt;P&gt;Thank You &lt;/P&gt;
&lt;P&gt;Venkat Anampudi&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 31 Dec 2017 14:31:29 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-can-i-change-the-parquet-compression-algorithm-from-gzip-to/m-p/30401#M22039</guid>
      <dc:creator>venkat_anampudi</dc:creator>
      <dc:date>2017-12-31T14:31:29Z</dc:date>
    </item>
    <item>
      <title>Re: How can I change the parquet compression algorithm from gzip to something else?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-can-i-change-the-parquet-compression-algorithm-from-gzip-to/m-p/30402#M22040</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;spark.sql("set spark.sql.parquet.compression.codec=gzip");
&lt;/CODE&gt;&lt;/PRE&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 01 Oct 2019 09:10:05 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-can-i-change-the-parquet-compression-algorithm-from-gzip-to/m-p/30402#M22040</guid>
      <dc:creator>ZhenZeng</dc:creator>
      <dc:date>2019-10-01T09:10:05Z</dc:date>
    </item>
    <item>
      <title>Re: How can I change the parquet compression algorithm from gzip to something else?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-can-i-change-the-parquet-compression-algorithm-from-gzip-to/m-p/30403#M22041</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Starting from spark version 2.1.0,"snappy" is the default compression and before that version "gzip" is default compression format in spark.&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 16 Jan 2020 10:47:34 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-can-i-change-the-parquet-compression-algorithm-from-gzip-to/m-p/30403#M22041</guid>
      <dc:creator>Pooja1</dc:creator>
      <dc:date>2020-01-16T10:47:34Z</dc:date>
    </item>
  </channel>
</rss>

