topic Re: Write empty dataframe into csv in Data Engineering

Write empty dataframe into csv

_not_provid1755 — Mon, 18 Mar 2019 23:42:57 GMT

I'm writing my output (entity) data frame into csv file. Below statement works well when the data frame is non-empty.

entity.repartition(1).write.mode(SaveMode.Overwrite).format("csv").option("header", "true").save(tempLocation)

It's not working when it is empty. Empty file is getting created and I'm expecting at least headers will show up so that my Tabular model won't fail with "Invalid column" error.

Anyone experienced this issue?

Thanks!

Re: Write empty dataframe into csv

mathan_pillai — Fri, 22 Mar 2019 19:16:34 GMT

Hi,

Thanks for reaching out to Databricks forum,

This is a bug with OSS, which is being fixed in Spark 3 version.

Here is the jira ticket about the issue

https://issues.apache.org/jira/browse/SPARK-26208

Here is the pull request for the fix, which will be merged

https://github.com/apache/spark/pull/23173

Porting the fix to the Databricks runtime versions is in the pipeline.

Please let us know whether it answers your question or if you have follow-up question.

Thanks

Re: Write empty dataframe into csv

Sandeep — Mon, 25 Mar 2019 11:35:27 GMT

Since Spark 2.4, writing a dataframe with an empty or nested empty schema using any file formats (parquet, orc, json, text, csv etc.) is not allowed. An exception is thrown when attempting to write dataframes with empty schema.

Please find more details here: https://spark.apache.org/docs/latest/sql-migration-guide-upgrade.html#upgrading-from-spark-sql-23-to-24

Re: Write empty dataframe into csv

mrnov — Tue, 07 May 2019 14:23:29 GMT

the same problem here (similar code and the same behavior with Spark 2.4.0, running with spark submit on Win and on Lin)

dataset.coalesce(1)
        .write()
        .option("charset", "UTF-8")
        .option("header", "true")
        .mode(SaveMode.Overwrite)
        .csv(outputDirPath);