09-04-2018 10:03 AM
Hi
i am reading from a text file from a blob
val sparkDF = spark.read.format(file_type)
.option("header", "true")
.option("inferSchema", "true")
.option("delimiter", file_delimiter)
.load(wasbs_string + "/" + PR_FileName)
Then i test my Dataframe
sparkDF.createOrReplaceTempView("tblLookupTable")
//sparkDF.show()
//sparkDF.show(10)
//sparkDF.take(5)
do some other things like
val sqlDF = spark.sql("SELECT * FROM tblLookupTable") //sqlDF.printSchema() //sqlDF.show(1000)up to this point everything works finally i want to write it to another blob
it works but the csv file is empty, why ????? can someone help
sqlDF.write.format("com.databricks.spark.csv") .option("header", "true") .option("inferSchema", "true") .option("delimiter", "|") .save("wasbs: //MyRootName@MyBlobName.blob.core.windows.net/MyPathName/mydata.csv")it works with no data , and yes it does create the temp folder "mydata.csv" and the file mydata.csv
but the file has no Headers and no data .
thanks
09-04-2018 10:59 AM
Hey Nik,
Can you do a file listing on that directory ".../MyPathName/mydata.csv/" and post the names of the files here?
Your data should be located in the CSV file(s) that begin with "part-00000-tid-xxxxx.csv", with each partition in a separate csv file unless when writing the file, you specify with:
sqlDF.coalesce(1).write.format("com.databricks.spark.csv")...
09-04-2018 12:13 PM
I only have one file in the folder "MyFolder" its a simple read me file
sqlDF.write.format(file_type) .option("header", "true") .option("inferSchema", "true") .option("delimiter", "|") .save("wasbs://MyContainer@MyBlob.blob.core.windows.net/MyFolder/mydata.csv")
..i do not see "part-00000-tid-xxxxx.csv" , by the way i am saving into a blob folder, as i had mentioned the file is getting created in the blob folder but with no data
09-04-2018 12:32 PM
Can you look in folder "mydata.csv"? In this context mydata.csv should be a folder, not a file.
09-04-2018 12:40 PM
It would be helpful to provide a screenshot of your Blob storage to see what your directory looks like.
09-04-2018 12:46 PM
ok , let me run it first
09-04-2018 12:51 PM
Yes you are right the file in the root is empty but the file with the name ...
part-00001-tid-4180453911607372978-2f22edb4-c9ca-47c8-8791-81cb9b71824c-8-c000.csv
has the data, BUT WHY??????
I asked the function to save into ....
wasbs://MyContainer@MyBlob.blob.core.windows.net/MyFolder/mydata.csv
do you know another way to save the data in the right folder? and the right file?
09-04-2018 12:57 PM
I tried with
sqlDF.write.format("com.databricks.spark.csv")
same thing the csv file is empty
09-04-2018 01:04 PM
09-04-2018 01:06 PM
In python they have something like [-1] that will save the file in the previous folder, do we have this in Scala?
09-04-2018 07:16 PM
I have noticed that the data have been split into to 2 CSV file in the folder, WHY is that??????
09-11-2018 11:47 AM
Hi, I have the same issue about what Nik has been facing. what could be the solution and i am expecting one csv file rather than containing a folder with unknown files?
09-13-2018 08:34 AM
Can i have update on this? I had same issue and not sure about resolution.
09-16-2018 03:30 PM
We Are going to work on this next week, please wait , thanks
09-16-2018 04:06 PM
The number of files written correspond to the number of partitions in the Spark dataframe. To reduce the number to 1 file, use coalesce():
sqlDF.coalesce(1).write.csv(<file-path>)...
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group