index a dataframe from a csv file based on the file's original order (not based on any specific column, based on the entire row) using spark
how to guarantee the index is always following the file's original order no matter what. Currently, I'm using val df = spark.read.options(Map("header"-> "true", "inferSchema" -> "true")).csv("filePath").withColumn("index", monotonically_increasing...
- 5201 Views
- 6 replies
- 2 kudos
Latest Reply
monotonically_increasing_id will not as it is to guarantee that every partition has separate ids. What is the whole code? Do you load directory with a lot of CSVs? What "original order" means? Is it csvs ordered by file creation date, by file name? o...
- 2 kudos