cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

How can I read all the files in a folder on S3 into several pandas dataframes?

zhaoxuan210
New Contributor

import pandas as pd

import glob

path = "s3://somewhere/" # use your path

all_files = glob.glob(path + "/*.csv")

print(all_files)

li = []

for filename in all_files:

dfi = pd.read_csv(filename,names =['acct_id', 'SOR_ID'], dtype={'acct_id':str,'SOR_ID':str},header = None )

li.append(dfi)

I can read the file if I read one of them. But the glob is not working here. The all_files will return a empty [], how to get the list of the filenames as an array?

1 REPLY 1

shyam_9
Valued Contributor
Valued Contributor
Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.