cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Load multiple csv files into a dataframe in order

Shridhar
New Contributor

I can load multiple csv files by doing something like:

paths = ["file_1", "file_2", "file_3"]
df = sqlContext.read
       .format("com.databricks.spark.csv")
       .option("header", "true")
       .load(paths)

But this doesn't seem to preserve the order in |paths|.

In particular, I'm trying to have a monotonically increasing id that spans the data in all files.

1 ACCEPTED SOLUTION

Accepted Solutions

Jaswanth_Saniko
New Contributor III
val diamonds = spark.read.format("csv")
  .option("header", "true")
  .option("inferSchema", "true")
  .load("/FileStore/tables/11.csv","/FileStore/tables/12.csv","/FileStore/tables/13.csv")
 
display(diamonds)

This is working for me @Shridhar​ 

View solution in original post

2 REPLIES 2

JayaKommuru
New Contributor II

@shridhar have you found out an alternative for achieving this. I also have the same problem.

Jaswanth_Saniko
New Contributor III
val diamonds = spark.read.format("csv")
  .option("header", "true")
  .option("inferSchema", "true")
  .load("/FileStore/tables/11.csv","/FileStore/tables/12.csv","/FileStore/tables/13.csv")
 
display(diamonds)

This is working for me @Shridhar​ 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.