cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Write to csv file in S3 bucket

mh_db
New Contributor III

I have a pandas dataframe in my Pyspark notebook. I want to save this dataframe to my S3 bucket. I'm using the following command to save it

import boto3

import s3fs
df_summary.to_csv(f"s3://dataconversion/data/exclude",index=False)

but I keep getting this error: ModuleNotFoundError: No module named 'botocore.compress'

I already tried to upgrade boto3 but same error. This problem seems to be with panda libraries only. I'm able to read from CSV with spark.read.format('csv') without issues

Any suggestions?

1 REPLY 1

shan_chandra
Databricks Employee
Databricks Employee

Hi @mh_db - you can import botocore library (or) if it is not found can do a pip install botocore to resolve this. Alternatively, you can maintain the data in a spark dataframe without converting to a pandas dataframe and while writing to a csv. you can use coalesce(1) to write to a single csv file (depending on your requirements). 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group