Databricks Community

Nik · ‎09-04-2018

Hi

i am reading from a text file from a blob

val sparkDF = spark.read.format(file_type)
.option("header", "true")
.option("inferSchema", "true")
.option("delimiter", file_delimiter)
.load(wasbs_string + "/" + PR_FileName)

Then i test my Dataframe

sparkDF.createOrReplaceTempView("tblLookupTable")
//sparkDF.show()
//sparkDF.show(10)
//sparkDF.take(5)

do some other things like

val sqlDF = spark.sql("SELECT * FROM tblLookupTable") //sqlDF.printSchema() //sqlDF.show(1000)

up to this point everything works finally i want to write it to another blob

it works but the csv file is empty, why ????? can someone help

sqlDF.write.format("com.databricks.spark.csv") .option("header", "true") .option("inferSchema", "true") .option("delimiter", "|") .save("wasbs: //MyRootName@MyBlobName.blob.core.windows.net/MyPathName/mydata.csv")

it works with no data , and yes it does create the temp folder "mydata.csv" and the file mydata.csv

but the file has no Headers and no data .

thanks

User16829051266 · ‎09-04-2018

Hey Nik,

Can you do a file listing on that directory ".../MyPathName/mydata.csv/" and post the names of the files here?

Your data should be located in the CSV file(s) that begin with "part-00000-tid-xxxxx.csv", with each partition in a separate csv file unless when writing the file, you specify with:

sqlDF.coalesce(1).write.format("com.databricks.spark.csv")...

Nik · ‎09-04-2018

I only have one file in the folder "MyFolder" its a simple read me file

sqlDF.write.format(file_type) .option("header", "true") .option("inferSchema", "true") .option("delimiter", "|") .save("wasbs://MyContainer@MyBlob.blob.core.windows.net/MyFolder/mydata.csv")

..i do not see "part-00000-tid-xxxxx.csv" , by the way i am saving into a blob folder, as i had mentioned the file is getting created in the blob folder but with no data

User16829051266 · ‎09-04-2018

Can you look in folder "mydata.csv"? In this context mydata.csv should be a folder, not a file.

User16829051266 · ‎09-04-2018

It would be helpful to provide a screenshot of your Blob storage to see what your directory looks like.

Nik · ‎09-04-2018

ok , let me run it first

Nik · ‎09-04-2018

Yes you are right the file in the root is empty but the file with the name ...

part-00001-tid-4180453911607372978-2f22edb4-c9ca-47c8-8791-81cb9b71824c-8-c000.csv

has the data, BUT WHY??????

I asked the function to save into ....

wasbs://MyContainer@MyBlob.blob.core.windows.net/MyFolder/mydata.csv

do you know another way to save the data in the right folder? and the right file?

Nik · ‎09-04-2018

I tried with

sqlDF.write.format("com.databricks.spark.csv")

same thing the csv file is empty

Nik · ‎09-04-2018

screenshot

Nik · ‎09-04-2018

In python they have something like [-1] that will save the file in the previous folder, do we have this in Scala?

Nik · ‎09-04-2018

I have noticed that the data have been split into to 2 CSV file in the folder, WHY is that??????

lalithagutthi · ‎09-11-2018

Hi, I have the same issue about what Nik has been facing. what could be the solution and i am expecting one csv file rather than containing a folder with unknown files?

lalithagutthi · ‎09-13-2018

Can i have update on this? I had same issue and not sure about resolution.

Nik · ‎09-16-2018

We Are going to work on this next week, please wait , thanks

User16829051266 · ‎09-16-2018

The number of files written correspond to the number of partitions in the Spark dataframe. To reduce the number to 1 file, use coalesce():

sqlDF.coalesce(1).write.csv(<file-path>)...

Databricks Community

write from a Dataframe to a CSV file, CSV file is blank

Get Certified at Data & AI Summit and Earn this Exclusive Databricks Jacket

Supercharge Your Code Generation

Registration now open! Databricks Data + AI Summit 2024

Announcing General Availability of Liquid Clustering

Introducing the Databricks AI Fund