cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

write from a Dataframe to a CSV file, CSV file is blank

Nik
New Contributor III

Hi

i am reading from a text file from a blob

val sparkDF = spark.read.format(file_type)
.option("header", "true")
.option("inferSchema", "true")
.option("delimiter", file_delimiter)
.load(wasbs_string + "/" + PR_FileName)

Then i test my Dataframe

sparkDF.createOrReplaceTempView("tblLookupTable")
//sparkDF.show()
//sparkDF.show(10)
//sparkDF.take(5)

do some other things like

val sqlDF = spark.sql("SELECT * FROM tblLookupTable") //sqlDF.printSchema() //sqlDF.show(1000)

up to this point everything works finally i want to write it to another blob

it works but the csv file is empty, why ????? can someone help

sqlDF.write.format("com.databricks.spark.csv") .option("header", "true") .option("inferSchema", "true") .option("delimiter", "|") .save("wasbs: //MyRootName@MyBlobName.blob.core.windows.net/MyPathName/mydata.csv")

it works with no data , and yes it does create the temp folder "mydata.csv" and the file mydata.csv

but the file has no Headers and no data .

thanks

19 REPLIES 19

User16829051266
New Contributor III

Hey Nik,

Can you do a file listing on that directory ".../MyPathName/mydata.csv/" and post the names of the files here?

Your data should be located in the CSV file(s) that begin with "part-00000-tid-xxxxx.csv", with each partition in a separate csv file unless when writing the file, you specify with:

sqlDF.coalesce(1).write.format("com.databricks.spark.csv")...

Nik
New Contributor III

I only have one file in the folder "MyFolder" its a simple read me file

sqlDF.write.format(file_type) .option("header", "true") .option("inferSchema", "true") .option("delimiter", "|") .save("wasbs://MyContainer@MyBlob.blob.core.windows.net/MyFolder/mydata.csv")

..i do not see "part-00000-tid-xxxxx.csv" , by the way i am saving into a blob folder, as i had mentioned the file is getting created in the blob folder but with no data

User16829051266
New Contributor III

Can you look in folder "mydata.csv"? In this context mydata.csv should be a folder, not a file.

User16829051266
New Contributor III

It would be helpful to provide a screenshot of your Blob storage to see what your directory looks like.

Nik
New Contributor III

ok , let me run it first

Nik
New Contributor III

Yes you are right the file in the root is empty but the file with the name ...

part-00001-tid-4180453911607372978-2f22edb4-c9ca-47c8-8791-81cb9b71824c-8-c000.csv

has the data, BUT WHY??????

I asked the function to save into ....

wasbs://MyContainer@MyBlob.blob.core.windows.net/MyFolder/mydata.csv

do you know another way to save the data in the right folder? and the right file?

Nik
New Contributor III

I tried with

sqlDF.write.format("com.databricks.spark.csv")

same thing the csv file is empty

Nik
New Contributor III

0693f000007OrnHAAS

screenshot

Nik
New Contributor III

In python they have something like [-1] that will save the file in the previous folder, do we have this in Scala?

Nik
New Contributor III

I have noticed that the data have been split into to 2 CSV file in the folder, WHY is that??????

lalithagutthi
New Contributor II

Hi, I have the same issue about what Nik has been facing. what could be the solution and i am expecting one csv file rather than containing a folder with unknown files?

lalithagutthi
New Contributor II

Can i have update on this? I had same issue and not sure about resolution.

Nik
New Contributor III

We Are going to work on this next week, please wait , thanks

User16829051266
New Contributor III

The number of files written correspond to the number of partitions in the Spark dataframe. To reduce the number to 1 file, use coalesce():

sqlDF.coalesce(1).write.csv(<file-path>)...