I can load multiple csv files by doing something like:
paths = ["file_1", "file_2", "file_3"]
df = sqlContext.read
.format("com.databricks.spark.csv")
.option("header", "true")
.load(paths)
But this doesn't seem to preserve the order in |paths|.
In particular, I'm trying to have a monotonically increasing id that spans the data in all files.