Databricks Community

Nik · ‎09-04-2018

Hi

i am reading from a text file from a blob

val sparkDF = spark.read.format(file_type)
.option("header", "true")
.option("inferSchema", "true")
.option("delimiter", file_delimiter)
.load(wasbs_string + "/" + PR_FileName)

Then i test my Dataframe

sparkDF.createOrReplaceTempView("tblLookupTable")
//sparkDF.show()
//sparkDF.show(10)
//sparkDF.take(5)

do some other things like

val sqlDF = spark.sql("SELECT * FROM tblLookupTable") //sqlDF.printSchema() //sqlDF.show(1000)

up to this point everything works finally i want to write it to another blob

it works but the csv file is empty, why ????? can someone help

sqlDF.write.format("com.databricks.spark.csv") .option("header", "true") .option("inferSchema", "true") .option("delimiter", "|") .save("wasbs: //MyRootName@MyBlobName.blob.core.windows.net/MyPathName/mydata.csv")

it works with no data , and yes it does create the temp folder "mydata.csv" and the file mydata.csv

but the file has no Headers and no data .

thanks

lalithagutthi · ‎09-16-2018

I do not want the folder. for example, if I were given test.csv, I am expecting CSV file. But, it's showing test.csv folder which contains multiple supporting files. moreover, the data file is coming with a unique name, which difficult to my call in ADF for identifiying name.

manojlukhi · ‎02-09-2019

Hey Nik /Maggi

here are my observations

1. you cannot pass file name in databricks api to other storage service

2. datalake /blob decides file names

3. you can rename files after saving them

Here is solution for you

###### Write your data frame to a single file with default name to a temp location "Part000-XXXXX"
TempFilePath = "wasbs://<DirectoryName>@<Subscription>.blob.core.windows.net/test"
Matrixdatadf.coalesce(1).write\
.mode("overwrite")\
.format("com.databricks.spark.csv")\
.option("header", "true")\
.save(TempFilePath)\
####### now read file from temp location write it to new location with new name and delete temp directory 
readPath = "wasbs://<DirectoryName>@<Subscription>.blob.core.windows.net/test"
writePath = "wasbs://<DirectoryName>@<Subscription>.blob.core.windows.net/MYfolder/ResultFiles"
file_list = dbutils.fs.ls(readPath) #### List out all files in temp directory
for i in file_list:
    file_path = i[0]
    file_name = i[1]
file_name
fname = "test.csv"
for i in file_list:
            if i[1].startswith("part-00000"): #### find your temp file name 
                 read_name = i[1]
# #####Move it outside to the new_dir folder and rename
dbutils.fs.mv(readPath+"/"+read_name, writePath+"/"+fname)
# #Remove the empty folder
dbutils.fs.rm(readPath , recurse= True)
<br>

Will be happy to help if some other help required

Iyyappan · ‎05-16-2019

@Maggie Chu @lalitha gutthi Do you have any solution for this issue. Am facing same problem, a folder is getting created with read only mode. But not files inside it. I’m using spark 2.3.1

Iyyappan · ‎05-16-2019

I got the answer,- Both input file directory & output file directory should not be same

nl09 · ‎06-25-2020

Create temp folder inside output folder. Copy file part-00000* with the file name to output folder. Delete the temp folder. Python code snippet to do the same.

fpath=output+'/'+'temp'
def file_exists(path):
  try:
    dbutils.fs.ls(path)
    return True
  except Exception as e:
    if 'java.io.FileNotFoundException' in str(e):
      return False
    else:
      raise
if file_exists(fpath):
  dbutils.fs.rm(fpath)
  spark.sql(query).coalesce(1).write.csv(fpath)
else:
  spark.sql(query).coalesce(1).write.csv(fpath)
fname=([x.name for x in dbutils.fs.ls(fpath) if x.name.startswith('part-00000')])
dbutils.fs.cp(fpath+"/"+fname[0], output+"/"+"somefile.csv")
dbutils.fs.rm(fpath, True)

----

Databricks Community

write from a Dataframe to a CSV file, CSV file is blank

🔔 ALERT: Act Now to Protect Your Community Account; Secure Your Details Before It's Too Late!

Databricks Learning Festival (Virtual): 10 July - 24 July 2024

Data + AI Summit 2024: An Executive Summary for Data Leaders

Big Data Is Back and Is More Important Than AI