cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

write from a Dataframe to a CSV file, CSV file is blank

Nik
New Contributor III

Hi

i am reading from a text file from a blob

val sparkDF = spark.read.format(file_type)
.option("header", "true")
.option("inferSchema", "true")
.option("delimiter", file_delimiter)
.load(wasbs_string + "/" + PR_FileName)

Then i test my Dataframe

sparkDF.createOrReplaceTempView("tblLookupTable")
//sparkDF.show()
//sparkDF.show(10)
//sparkDF.take(5)

do some other things like

val sqlDF = spark.sql("SELECT * FROM tblLookupTable") //sqlDF.printSchema() //sqlDF.show(1000)

up to this point everything works finally i want to write it to another blob

it works but the csv file is empty, why ????? can someone help

sqlDF.write.format("com.databricks.spark.csv") .option("header", "true") .option("inferSchema", "true") .option("delimiter", "|") .save("wasbs: //MyRootName@MyBlobName.blob.core.windows.net/MyPathName/mydata.csv")

it works with no data , and yes it does create the temp folder "mydata.csv" and the file mydata.csv

but the file has no Headers and no data .

thanks

19 REPLIES 19

I do not want the folder. for example, if I were given test.csv, I am expecting CSV file. But, it's showing test.csv folder which contains multiple supporting files. moreover, the data file is coming with a unique name, which difficult to my call in ADF for identifiying name.

manojlukhi
New Contributor II

Hey Nik /Maggi

here are my observations

1. you cannot pass file name in databricks api to other storage service

2. datalake /blob decides file names

3. you can rename files after saving them

Here is solution for you

###### Write your data frame to a single file with default name to a temp location "Part000-XXXXX"
TempFilePath = "wasbs://<DirectoryName>@<Subscription>.blob.core.windows.net/test"
Matrixdatadf.coalesce(1).write\
.mode("overwrite")\
.format("com.databricks.spark.csv")\
.option("header", "true")\
.save(TempFilePath)\
####### now read file from temp location write it to new location with new name and delete temp directory 
readPath = "wasbs://<DirectoryName>@<Subscription>.blob.core.windows.net/test"
writePath = "wasbs://<DirectoryName>@<Subscription>.blob.core.windows.net/MYfolder/ResultFiles"
file_list = dbutils.fs.ls(readPath) #### List out all files in temp directory
for i in file_list:
    file_path = i[0]
    file_name = i[1]
file_name
fname = "test.csv"
for i in file_list:
            if i[1].startswith("part-00000"): #### find your temp file name 
                 read_name = i[1]
# #####Move it outside to the new_dir folder and rename
dbutils.fs.mv(readPath+"/"+read_name, writePath+"/"+fname)
# #Remove the empty folder
dbutils.fs.rm(readPath , recurse= True)
<br>

Will be happy to help if some other help required

Iyyappan
New Contributor II

@Maggie Chu​  @lalitha gutthi​  Do you have any solution for this issue. Am facing same problem, a folder is getting created with read only mode. But not files inside it. I’m using spark 2.3.1

Iyyappan
New Contributor II

I got the answer,- Both input file directory & output file directory should not be same

nl09
New Contributor II

Create temp folder inside output folder. Copy file part-00000* with the file name to output folder. Delete the temp folder. Python code snippet to do the same.

fpath=output+'/'+'temp'
def file_exists(path):
  try:
    dbutils.fs.ls(path)
    return True
  except Exception as e:
    if 'java.io.FileNotFoundException' in str(e):
      return False
    else:
      raise
if file_exists(fpath):
  dbutils.fs.rm(fpath)
  spark.sql(query).coalesce(1).write.csv(fpath)
else:
  spark.sql(query).coalesce(1).write.csv(fpath)
fname=([x.name for x in dbutils.fs.ls(fpath) if x.name.startswith('part-00000')])
dbutils.fs.cp(fpath+"/"+fname[0], output+"/"+"somefile.csv")
dbutils.fs.rm(fpath, True) 

----

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group