<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic how to zip a dataframe in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-to-zip-a-dataframe/m-p/13214#M7928</link>
    <description>&lt;P&gt;how to zip a dataframe so that i get a zipped csv output file. please share command. it is only 1 dataframe involved and not multiple.&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 15 Oct 2021 22:13:54 GMT</pubDate>
    <dc:creator>amitdatabricksc</dc:creator>
    <dc:date>2021-10-15T22:13:54Z</dc:date>
    <item>
      <title>how to zip a dataframe</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-zip-a-dataframe/m-p/13214#M7928</link>
      <description>&lt;P&gt;how to zip a dataframe so that i get a zipped csv output file. please share command. it is only 1 dataframe involved and not multiple.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 15 Oct 2021 22:13:54 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-zip-a-dataframe/m-p/13214#M7928</guid>
      <dc:creator>amitdatabricksc</dc:creator>
      <dc:date>2021-10-15T22:13:54Z</dc:date>
    </item>
    <item>
      <title>Re: how to zip a dataframe</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-zip-a-dataframe/m-p/13215#M7929</link>
      <description>&lt;P&gt;If you are using pyspark you can do something like the following: &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;df.coalesce(1).write.option("compression","gzip").csv("path")
&amp;nbsp;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Note the coalesce will reduce the number of partitions so that it is saved as a single file. In addition to gzip you can use "bzip2", "lz4", "snappy", and "deflate". &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If you are not using pyspark and are using pandas then you can use the pandas compression option which can be found &lt;A href="https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html" alt="https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html" target="_blank"&gt;here&lt;/A&gt;. &lt;/P&gt;</description>
      <pubDate>Fri, 15 Oct 2021 22:24:50 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-zip-a-dataframe/m-p/13215#M7929</guid>
      <dc:creator>Ryan_Chynoweth</dc:creator>
      <dc:date>2021-10-15T22:24:50Z</dc:date>
    </item>
    <item>
      <title>Re: how to zip a dataframe</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-zip-a-dataframe/m-p/13216#M7930</link>
      <description>&lt;P&gt;if my path is my local directory then how should i write it&lt;/P&gt;&lt;P&gt;when i do df.coalesce(1).write.option("compression","gzip").csv("C:/Users/ag") i am getting an error.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Also, can u provide an example for output path to blob storage folder&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 15 Oct 2021 22:53:53 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-zip-a-dataframe/m-p/13216#M7930</guid>
      <dc:creator>amitdatabricksc</dc:creator>
      <dc:date>2021-10-15T22:53:53Z</dc:date>
    </item>
    <item>
      <title>Re: how to zip a dataframe</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-zip-a-dataframe/m-p/13217#M7931</link>
      <description>&lt;P&gt;writing to a local directory does not work.&lt;/P&gt;&lt;P&gt;See this topic:&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.databricks.com/s/feed/0D53f00001M7hNlCAJ" alt="https://community.databricks.com/s/feed/0D53f00001M7hNlCAJ" target="_blank"&gt;https://community.databricks.com/s/feed/0D53f00001M7hNlCAJ&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 18 Oct 2021 08:20:14 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-zip-a-dataframe/m-p/13217#M7931</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2021-10-18T08:20:14Z</dc:date>
    </item>
    <item>
      <title>Re: how to zip a dataframe</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-zip-a-dataframe/m-p/54258#M30032</link>
      <description>&lt;P&gt;Thanks. I have 19 files as csv in s3 and would like to zip all 19 csv files as one zip file. Please advise on this,&lt;/P&gt;</description>
      <pubDate>Thu, 30 Nov 2023 00:58:55 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-zip-a-dataframe/m-p/54258#M30032</guid>
      <dc:creator>MadhanSubbiah81</dc:creator>
      <dc:date>2023-11-30T00:58:55Z</dc:date>
    </item>
  </channel>
</rss>

