cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

How to create a dataframe with the files from S3 bucket

akj2784
New Contributor II

I have connected my S3 bucket from databricks.

Using the following command :

import urllib

import urllib.parse

ACCESS_KEY = "Test"

SECRET_KEY = "Test"

ENCODED_SECRET_KEY = urllib.parse.quote(SECRET_KEY, "") AWS_BUCKET_NAME = "Test" MOUNT_NAME = "S3_Connection_details" dbutils.fs.mount("s3n://%s:%s@%s" % (ACCESS_KEY, ENCODED_SECRET_KEY,AWS_BUCKET_NAME), "/mnt/%s" % MOUNT_NAME)

Now when I run the below command, I get the list of csv files present in the bucket.

display(dbutils.fs.ls("/mnt/S3_Connection"))

If there are 10 files, I want to create 10 different tables in postgreSQL after reading the csv files. I don't need any transformation. Is it feasible ?

First of all how to create a dataframe using one of the csv file. If anyone can help me with the syntax.

Regards,

Akash

5 REPLIES 5

shyam_9
Valued Contributor
Valued Contributor

Hi @akj2784,

Please go through Databricks documentation on working with files in S3,

https://docs.databricks.com/spark/latest/data-sources/aws/amazon-s3.html#mount-s3-buckets-with-dbfs

akj2784
New Contributor II

I have already checked this... still not able to see data.

df = spark.read.text("mnt/S3_Connection/Details.csv")

Still I don't see data.

shyam_9
Valued Contributor
Valued Contributor

Try to read using below methods,

df = spark.read.text("/mnt/%s/...." % MOUNT_NAME)

and

df = sc.textFile("s3a://%s:%s@%s/.../..." % ACCESS_KEY, ENCODED_SECRET_KEY, BUCKET_NAME)

akj2784
New Contributor II

I am able to create dataframe but when I do df.head(), I see only the columns names. However I want to see the data as well.

Please take a look at the documentation. df.head() will show the first 1 row by default, but you can add an integer as a parameter to show additional: https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrame.head

Please look at other display methods such as df.show() or the custom databricks method display(df)

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.