cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Write to csv file in S3 bucket

mh_db
New Contributor III

I have a pandas dataframe in my Pyspark notebook. I want to save this dataframe to my S3 bucket. I'm using the following command to save it

import boto3

import s3fs
df_summary.to_csv(f"s3://dataconversion/data/exclude",index=False)

but I keep getting this error: ModuleNotFoundError: No module named 'botocore.compress'

I already tried to upgrade boto3 but same error. This problem seems to be with panda libraries only. I'm able to read from CSV with spark.read.format('csv') without issues

Any suggestions?

1 REPLY 1

shan_chandra
Esteemed Contributor
Esteemed Contributor

Hi @mh_db - you can import botocore library (or) if it is not found can do a pip install botocore to resolve this. Alternatively, you can maintain the data in a spark dataframe without converting to a pandas dataframe and while writing to a csv. you can use coalesce(1) to write to a single csv file (depending on your requirements). 

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!