09-04-2018 10:03 AM
Hi
i am reading from a text file from a blob
val sparkDF = spark.read.format(file_type)
.option("header", "true")
.option("inferSchema", "true")
.option("delimiter", file_delimiter)
.load(wasbs_string + "/" + PR_FileName)
Then i test my Dataframe
sparkDF.createOrReplaceTempView("tblLookupTable")
//sparkDF.show()
//sparkDF.show(10)
//sparkDF.take(5)
do some other things like
val sqlDF = spark.sql("SELECT * FROM tblLookupTable") //sqlDF.printSchema() //sqlDF.show(1000)up to this point everything works finally i want to write it to another blob
it works but the csv file is empty, why ????? can someone help
sqlDF.write.format("com.databricks.spark.csv") .option("header", "true") .option("inferSchema", "true") .option("delimiter", "|") .save("wasbs: //MyRootName@MyBlobName.blob.core.windows.net/MyPathName/mydata.csv")it works with no data , and yes it does create the temp folder "mydata.csv" and the file mydata.csv
but the file has no Headers and no data .
thanks
09-16-2018 05:08 PM
I do not want the folder. for example, if I were given test.csv, I am expecting CSV file. But, it's showing test.csv folder which contains multiple supporting files. moreover, the data file is coming with a unique name, which difficult to my call in ADF for identifiying name.
02-09-2019 12:51 AM
Hey Nik /Maggi
here are my observations
1. you cannot pass file name in databricks api to other storage service
2. datalake /blob decides file names
3. you can rename files after saving them
Here is solution for you
###### Write your data frame to a single file with default name to a temp location "Part000-XXXXX"
TempFilePath = "wasbs://<DirectoryName>@<Subscription>.blob.core.windows.net/test"
Matrixdatadf.coalesce(1).write\
.mode("overwrite")\
.format("com.databricks.spark.csv")\
.option("header", "true")\
.save(TempFilePath)\
####### now read file from temp location write it to new location with new name and delete temp directory
readPath = "wasbs://<DirectoryName>@<Subscription>.blob.core.windows.net/test"
writePath = "wasbs://<DirectoryName>@<Subscription>.blob.core.windows.net/MYfolder/ResultFiles"
file_list = dbutils.fs.ls(readPath) #### List out all files in temp directory
for i in file_list:
file_path = i[0]
file_name = i[1]
file_name
fname = "test.csv"
for i in file_list:
if i[1].startswith("part-00000"): #### find your temp file name
read_name = i[1]
# #####Move it outside to the new_dir folder and rename
dbutils.fs.mv(readPath+"/"+read_name, writePath+"/"+fname)
# #Remove the empty folder
dbutils.fs.rm(readPath , recurse= True)
<br>
Will be happy to help if some other help required
05-16-2019 10:08 AM
@Maggie Chu @lalitha gutthi Do you have any solution for this issue. Am facing same problem, a folder is getting created with read only mode. But not files inside it. I’m using spark 2.3.1
05-16-2019 08:02 PM
I got the answer,- Both input file directory & output file directory should not be same
06-25-2020 09:15 AM
Create temp folder inside output folder. Copy file part-00000* with the file name to output folder. Delete the temp folder. Python code snippet to do the same.
fpath=output+'/'+'temp'
def file_exists(path):
try:
dbutils.fs.ls(path)
return True
except Exception as e:
if 'java.io.FileNotFoundException' in str(e):
return False
else:
raise
if file_exists(fpath):
dbutils.fs.rm(fpath)
spark.sql(query).coalesce(1).write.csv(fpath)
else:
spark.sql(query).coalesce(1).write.csv(fpath)
fname=([x.name for x in dbutils.fs.ls(fpath) if x.name.startswith('part-00000')])
dbutils.fs.cp(fpath+"/"+fname[0], output+"/"+"somefile.csv")
dbutils.fs.rm(fpath, True)
----
Excited to expand your horizons with us? Click here to Register and begin your journey to success!
Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!