Re: write from a Dataframe to a CSV file, CSV fil...

lalithagutthi · ‎09-16-2018

I do not want the folder. for example, if I were given test.csv, I am expecting CSV file. But, it's showing test.csv folder which contains multiple supporting files. moreover, the data file is coming with a unique name, which difficult to my call in ADF for identifiying name.

manojlukhi · ‎02-09-2019

Hey Nik /Maggi

here are my observations

1. you cannot pass file name in databricks api to other storage service

2. datalake /blob decides file names

3. you can rename files after saving them

Here is solution for you

###### Write your data frame to a single file with default name to a temp location "Part000-XXXXX"
TempFilePath = "wasbs://<DirectoryName>@<Subscription>.blob.core.windows.net/test"
Matrixdatadf.coalesce(1).write\
.mode("overwrite")\
.format("com.databricks.spark.csv")\
.option("header", "true")\
.save(TempFilePath)\
####### now read file from temp location write it to new location with new name and delete temp directory 
readPath = "wasbs://<DirectoryName>@<Subscription>.blob.core.windows.net/test"
writePath = "wasbs://<DirectoryName>@<Subscription>.blob.core.windows.net/MYfolder/ResultFiles"
file_list = dbutils.fs.ls(readPath) #### List out all files in temp directory
for i in file_list:
    file_path = i[0]
    file_name = i[1]
file_name
fname = "test.csv"
for i in file_list:
            if i[1].startswith("part-00000"): #### find your temp file name 
                 read_name = i[1]
# #####Move it outside to the new_dir folder and rename
dbutils.fs.mv(readPath+"/"+read_name, writePath+"/"+fname)
# #Remove the empty folder
dbutils.fs.rm(readPath , recurse= True)
<br>

Will be happy to help if some other help required

Iyyappan · ‎05-16-2019

@Maggie Chu @lalitha gutthi Do you have any solution for this issue. Am facing same problem, a folder is getting created with read only mode. But not files inside it. I’m using spark 2.3.1

Iyyappan · ‎05-16-2019

I got the answer,- Both input file directory & output file directory should not be same

nl09 · ‎06-25-2020

Create temp folder inside output folder. Copy file part-00000* with the file name to output folder. Delete the temp folder. Python code snippet to do the same.

fpath=output+'/'+'temp'
def file_exists(path):
  try:
    dbutils.fs.ls(path)
    return True
  except Exception as e:
    if 'java.io.FileNotFoundException' in str(e):
      return False
    else:
      raise
if file_exists(fpath):
  dbutils.fs.rm(fpath)
  spark.sql(query).coalesce(1).write.csv(fpath)
else:
  spark.sql(query).coalesce(1).write.csv(fpath)
fname=([x.name for x in dbutils.fs.ls(fpath) if x.name.startswith('part-00000')])
dbutils.fs.cp(fpath+"/"+fname[0], output+"/"+"somefile.csv")
dbutils.fs.rm(fpath, True)

----