09-04-2018 10:03 AM
Hi
i am reading from a text file from a blob
val sparkDF = spark.read.format(file_type)
.option("header", "true")
.option("inferSchema", "true")
.option("delimiter", file_delimiter)
.load(wasbs_string + "/" + PR_FileName)
Then i test my Dataframe
sparkDF.createOrReplaceTempView("tblLookupTable")
//sparkDF.show()
//sparkDF.show(10)
//sparkDF.take(5)
do some other things like
val sqlDF = spark.sql("SELECT * FROM tblLookupTable") //sqlDF.printSchema() //sqlDF.show(1000)up to this point everything works finally i want to write it to another blob
it works but the csv file is empty, why ????? can someone help
sqlDF.write.format("com.databricks.spark.csv") .option("header", "true") .option("inferSchema", "true") .option("delimiter", "|") .save("wasbs: //MyRootName@MyBlobName.blob.core.windows.net/MyPathName/mydata.csv")it works with no data , and yes it does create the temp folder "mydata.csv" and the file mydata.csv
but the file has no Headers and no data .
thanks
09-16-2018 05:08 PM
I do not want the folder. for example, if I were given test.csv, I am expecting CSV file. But, it's showing test.csv folder which contains multiple supporting files. moreover, the data file is coming with a unique name, which difficult to my call in ADF for identifiying name.
02-09-2019 12:51 AM
Hey Nik /Maggi
here are my observations
1. you cannot pass file name in databricks api to other storage service
2. datalake /blob decides file names
3. you can rename files after saving them
Here is solution for you
###### Write your data frame to a single file with default name to a temp location "Part000-XXXXX"
TempFilePath = "wasbs://<DirectoryName>@<Subscription>.blob.core.windows.net/test"
Matrixdatadf.coalesce(1).write\
.mode("overwrite")\
.format("com.databricks.spark.csv")\
.option("header", "true")\
.save(TempFilePath)\
####### now read file from temp location write it to new location with new name and delete temp directory
readPath = "wasbs://<DirectoryName>@<Subscription>.blob.core.windows.net/test"
writePath = "wasbs://<DirectoryName>@<Subscription>.blob.core.windows.net/MYfolder/ResultFiles"
file_list = dbutils.fs.ls(readPath) #### List out all files in temp directory
for i in file_list:
file_path = i[0]
file_name = i[1]
file_name
fname = "test.csv"
for i in file_list:
if i[1].startswith("part-00000"): #### find your temp file name
read_name = i[1]
# #####Move it outside to the new_dir folder and rename
dbutils.fs.mv(readPath+"/"+read_name, writePath+"/"+fname)
# #Remove the empty folder
dbutils.fs.rm(readPath , recurse= True)
<br>
Will be happy to help if some other help required
05-16-2019 10:08 AM
@Maggie Chu @lalitha gutthi Do you have any solution for this issue. Am facing same problem, a folder is getting created with read only mode. But not files inside it. I’m using spark 2.3.1
05-16-2019 08:02 PM
I got the answer,- Both input file directory & output file directory should not be same
06-25-2020 09:15 AM
Create temp folder inside output folder. Copy file part-00000* with the file name to output folder. Delete the temp folder. Python code snippet to do the same.
fpath=output+'/'+'temp'
def file_exists(path):
try:
dbutils.fs.ls(path)
return True
except Exception as e:
if 'java.io.FileNotFoundException' in str(e):
return False
else:
raise
if file_exists(fpath):
dbutils.fs.rm(fpath)
spark.sql(query).coalesce(1).write.csv(fpath)
else:
spark.sql(query).coalesce(1).write.csv(fpath)
fname=([x.name for x in dbutils.fs.ls(fpath) if x.name.startswith('part-00000')])
dbutils.fs.cp(fpath+"/"+fname[0], output+"/"+"somefile.csv")
dbutils.fs.rm(fpath, True)
----
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group