How to store a pyspark dataframe in S3 bucket. - Databricks - 28633

Register to join the community

Data Engineering

I have a pyspark dataframe df containing 4 columns. How can I write this dataframe to s3 bucket?

I'm using pycharm to execute the code. and what are the packages required to be installed?

1 REPLY 1

You shouldn't need any packages. You can mount S3 bucket to Databricks cluster.

https://docs.databricks.com/spark/latest/data-sources/aws/amazon-s3.html#mount-aws-s3

or this

http://www.sparktutorials.net/Reading+and+Writing+S3+Data+with+Apache+Spark

never-displayed

You must be signed in to add attachments

never-displayed

Announcements

Registration now open! Databricks Data + AI Summit 2024

Meet DBRX, the New Standard for High-Quality LLMs

Data Warehousing in the Era of AI