How do I create a single CSV file from multiple pa...

rlgarris · ‎12-02-2015

Using sparkcsv to write data to dbfs, which I plan to move to my laptop via standard s3 copy commands.

The default for spark csv is to write output into partitions. I can force it to a single partition, but would really like to know if there is a generic way to do this.

In a hadoop file system, I'd simply run something like

hadoop fs -getmerge /user/hadoop/dir1/ ./myoutput.txt

Any equivalent from within the databricks platform?

How do I create a single CSV file from multiple partitions in Databricks / Spark?