cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Write to csv file in S3 bucket

mh_db
New Contributor II

I have a pandas dataframe in my Pyspark notebook. I want to save this dataframe to my S3 bucket. I'm using the following command to save it

import boto3

import s3fs
df_summary.to_csv(f"s3://dataconversion/data/exclude",index=False)

but I keep getting this error: ModuleNotFoundError: No module named 'botocore.compress'

I already tried to upgrade boto3 but same error. This problem seems to be with panda libraries only. I'm able to read from CSV with spark.read.format('csv') without issues

Any suggestions?

1 REPLY 1

shan_chandra
Esteemed Contributor
Esteemed Contributor

Hi @mh_db - you can import botocore library (or) if it is not found can do a pip install botocore to resolve this. Alternatively, you can maintain the data in a spark dataframe without converting to a pandas dataframe and while writing to a csv. you can use coalesce(1) to write to a single csv file (depending on your requirements).