Databricks

User16790091296 · ‎06-24-2021

I've tried with :

df.write.mode("overwrite").format("com.databricks.spark.csv").option("header","true").csv(dstPath)

and

df.write.format("csv").mode("overwrite").save(dstPath)

but now I have 10 csv files but I need one file and name it.

Ryan_Chynoweth · ‎06-24-2021

The header question seems different than your body question. I am assuming that you are asking how to only get a single CSV file when writing?

To do so you should use the coalesce:

df.coalesce(1).write.format("csv").mode("overwrite").save(dstPath)

This will save a single CSV file underneath the directory that you provide. If you want to have a specific name of the file then you will need to rename it. You could just use dbutils.fs.cp() to copy the file with a new name or you can use Python "os" library to rename it.

Databricks

How do we get logs on read queries from delta lake in Databricks?

Unity Catalog Lakeguard: Industry-first and only data governance for multi-user Apache™ Spark cluste

Announcing the General Availability of Databricks Asset Bundles

Register now and save 50% on training at Data + AI Summit!

How to successfully build GenAI applications

Meet DBRX, the New Standard for High-Quality LLMs