Databricks

User16790091296 · ‎06-24-2021

I've tried with :

df.write.mode("overwrite").format("com.databricks.spark.csv").option("header","true").csv(dstPath)

and

df.write.format("csv").mode("overwrite").save(dstPath)

but now I have 10 csv files but I need one file and name it.

Ryan_Chynoweth · ‎06-24-2021

The header question seems different than your body question. I am assuming that you are asking how to only get a single CSV file when writing?

To do so you should use the coalesce:

df.coalesce(1).write.format("csv").mode("overwrite").save(dstPath)

This will save a single CSV file underneath the directory that you provide. If you want to have a specific name of the file then you will need to rename it. You could just use dbutils.fs.cp() to copy the file with a new name or you can use Python "os" library to rename it.

Databricks

How do we get logs on read queries from delta lake in Databricks?

Announcing the General Availability of Databricks Asset Bundles

How to successfully build GenAI applications

Registration now open! Databricks Data + AI Summit 2024

Meet DBRX, the New Standard for High-Quality LLMs

Register now and save 50% on training at Data + AI Summit!