write from a Dataframe to a CSV file, CSV file is blank
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-04-2018 10:03 AM
Hi
i am reading from a text file from a blob
val sparkDF = spark.read.format(file_type)
.option("header", "true")
.option("inferSchema", "true")
.option("delimiter", file_delimiter)
.load(wasbs_string + "/" + PR_FileName)
Then i test my Dataframe
sparkDF.createOrReplaceTempView("tblLookupTable")
//sparkDF.show()
//sparkDF.show(10)
//sparkDF.take(5)
do some other things like
val sqlDF = spark.sql("SELECT * FROM tblLookupTable") //sqlDF.printSchema() //sqlDF.show(1000)up to this point everything works finally i want to write it to another blob
it works but the csv file is empty, why ????? can someone help
sqlDF.write.format("com.databricks.spark.csv") .option("header", "true") .option("inferSchema", "true") .option("delimiter", "|") .save("wasbs: //MyRootName@MyBlobName.blob.core.windows.net/MyPathName/mydata.csv")it works with no data , and yes it does create the temp folder "mydata.csv" and the file mydata.csv
but the file has no Headers and no data .
thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-04-2018 10:59 AM
Hey Nik,
Can you do a file listing on that directory ".../MyPathName/mydata.csv/" and post the names of the files here?
Your data should be located in the CSV file(s) that begin with "part-00000-tid-xxxxx.csv", with each partition in a separate csv file unless when writing the file, you specify with:
sqlDF.coalesce(1).write.format("com.databricks.spark.csv")...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-04-2018 12:13 PM
I only have one file in the folder "MyFolder" its a simple read me file
sqlDF.write.format(file_type) .option("header", "true") .option("inferSchema", "true") .option("delimiter", "|") .save("wasbs://MyContainer@MyBlob.blob.core.windows.net/MyFolder/mydata.csv")
..i do not see "part-00000-tid-xxxxx.csv" , by the way i am saving into a blob folder, as i had mentioned the file is getting created in the blob folder but with no data
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-04-2018 12:32 PM
Can you look in folder "mydata.csv"? In this context mydata.csv should be a folder, not a file.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-04-2018 12:40 PM
It would be helpful to provide a screenshot of your Blob storage to see what your directory looks like.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-04-2018 12:46 PM
ok , let me run it first
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-04-2018 12:51 PM
Yes you are right the file in the root is empty but the file with the name ...
part-00001-tid-4180453911607372978-2f22edb4-c9ca-47c8-8791-81cb9b71824c-8-c000.csv
has the data, BUT WHY??????
I asked the function to save into ....
wasbs://MyContainer@MyBlob.blob.core.windows.net/MyFolder/mydata.csv
do you know another way to save the data in the right folder? and the right file?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-04-2018 12:57 PM
I tried with
sqlDF.write.format("com.databricks.spark.csv")
same thing the csv file is empty
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-04-2018 01:04 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-04-2018 01:06 PM
In python they have something like [-1] that will save the file in the previous folder, do we have this in Scala?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-04-2018 07:16 PM
I have noticed that the data have been split into to 2 CSV file in the folder, WHY is that??????
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-11-2018 11:47 AM
Hi, I have the same issue about what Nik has been facing. what could be the solution and i am expecting one csv file rather than containing a folder with unknown files?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-13-2018 08:34 AM
Can i have update on this? I had same issue and not sure about resolution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-16-2018 03:30 PM
We Are going to work on this next week, please wait , thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-16-2018 04:06 PM
The number of files written correspond to the number of partitions in the Spark dataframe. To reduce the number to 1 file, use coalesce():
sqlDF.coalesce(1).write.csv(<file-path>)...

