topic how to zip a dataframe in Data Engineering

how to zip a dataframe

amitdatabricksc — Fri, 15 Oct 2021 22:13:54 GMT

how to zip a dataframe so that i get a zipped csv output file. please share command. it is only 1 dataframe involved and not multiple.

Re: how to zip a dataframe

Ryan_Chynoweth — Fri, 15 Oct 2021 22:24:50 GMT

If you are using pyspark you can do something like the following:

df.coalesce(1).write.option("compression","gzip").csv("path")

Note the coalesce will reduce the number of partitions so that it is saved as a single file. In addition to gzip you can use "bzip2", "lz4", "snappy", and "deflate".

If you are not using pyspark and are using pandas then you can use the pandas compression option which can be found here.

Re: how to zip a dataframe

amitdatabricksc — Fri, 15 Oct 2021 22:53:53 GMT

if my path is my local directory then how should i write it

when i do df.coalesce(1).write.option("compression","gzip").csv("C:/Users/ag") i am getting an error.

Also, can u provide an example for output path to blob storage folder

Re: how to zip a dataframe

-werners- — Mon, 18 Oct 2021 08:20:14 GMT

writing to a local directory does not work.

See this topic:

https://community.databricks.com/s/feed/0D53f00001M7hNlCAJ

Re: how to zip a dataframe

MadhanSubbiah81 — Thu, 30 Nov 2023 00:58:55 GMT

Thanks. I have 19 files as csv in s3 and would like to zip all 19 csv files as one zip file. Please advise on this,