topic Re: Load multiple csv files into a dataframe in order in Data Engineering

Load multiple csv files into a dataframe in order

Shridhar — Thu, 18 Oct 2018 01:24:35 GMT

I can load multiple csv files by doing something like:

paths = ["file_1", "file_2", "file_3"]
df = sqlContext.read
       .format("com.databricks.spark.csv")
       .option("header", "true")
       .load(paths)

But this doesn't seem to preserve the order in |paths|.

In particular, I'm trying to have a monotonically increasing id that spans the data in all files.

Re: Load multiple csv files into a dataframe in order

JayaKommuru — Wed, 20 Nov 2019 03:50:40 GMT

@shridhar have you found out an alternative for achieving this. I also have the same problem.

Re: Load multiple csv files into a dataframe in order

Jaswanth_Saniko — Wed, 12 Jan 2022 12:43:10 GMT

val diamonds = spark.read.format("csv")
  .option("header", "true")
  .option("inferSchema", "true")
  .load("/FileStore/tables/11.csv","/FileStore/tables/12.csv","/FileStore/tables/13.csv")
 
display(diamonds)

This is working for me @Shridhar