how to zip a dataframe
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-15-2021 03:13 PM
how to zip a dataframe so that i get a zipped csv output file. please share command. it is only 1 dataframe involved and not multiple.
- Labels:
-
Dataframe
-
Sf Username
-
Zip
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-15-2021 03:24 PM
If you are using pyspark you can do something like the following:
df.coalesce(1).write.option("compression","gzip").csv("path")
Note the coalesce will reduce the number of partitions so that it is saved as a single file. In addition to gzip you can use "bzip2", "lz4", "snappy", and "deflate".
If you are not using pyspark and are using pandas then you can use the pandas compression option which can be found here.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-29-2023 04:58 PM
Thanks. I have 19 files as csv in s3 and would like to zip all 19 csv files as one zip file. Please advise on this,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-15-2021 03:53 PM
if my path is my local directory then how should i write it
when i do df.coalesce(1).write.option("compression","gzip").csv("C:/Users/ag") i am getting an error.
Also, can u provide an example for output path to blob storage folder
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-18-2021 01:20 AM
writing to a local directory does not work.
See this topic: