โ11-23-2021 10:31 PM
I'm trying to read a small txt file which is added as a table to the default db on Databricks. While trying to read the file via Local File API, I get a `FileNotFoundError`, but I'm able to read the same file as Spark RDD using SparkContext.
Please find the code below:
with open("/FileStore/tables/boringwords.txt", "r") as f_read:
for line in f_read:
print(line)
The error I get is:
FileNotFoundError Traceback (most recent call last)
<command-2618449717515592> in <module>
----> 1 with open("dbfs:/FileStore/tables/boringwords.txt", "r") as f_read:
2 for line in f_read:
3 print(line)
FileNotFoundError: [Errno 2] No such file or directory: 'dbfs:/FileStore/tables/boringwords.txt'
Where as, I have no problem reading the file using SparkContext:
boring_words = sc.textFile("/FileStore/tables/boringwords.txt")
set(i.strip() for i in boring_words.collect())
And as expected, I get the result for the above block of code:
Out[4]: {'mad',
'mobile',
'filename',
'circle',
'cookies',
'immigration',
'anticipated',
'editorials',
'review'}
I was also referring to the DBFS documentation to understand the Local File API's limitations but of no lead on the issue. Any help would be greatly appreciated. Thanks!
โ11-23-2021 11:49 PM
can you try with /dbfs/Filestore/tables/boringwords.txt?
โ11-24-2021 12:58 AM
Hey there @Werner Stinckensโ ! Thanks for your response!
I've tried your suggestion and I still get the same error!
PFA the snip below:
Moreover, I've realized that adding ```/dbfs``` to the path is optional, as I've stored the data in the default database. Refer to the OP that I'm creating a RDD by passing the path ```"/FileStore/Tables/filename.txt"``` in `sc.textFile`.
Thank you!
โ11-24-2021 01:11 AM
you forgot a "/" as the first character in your file path.
โ11-24-2021 01:52 AM
Hello @Werner Stinckensโ !
You're right! I missed the '/' earlier.
But, nothing changed after adding the '/' before dbfs. Below is the snip:
Moreover, when I tried the same path notation with SparkContext - It threw me an error:
I'm starting to wonder if this is the right way to provide the absolute path.
On the contrary, I've gave the path as "dbfs:/FileStore/tables/boringwords.txt" and it worked.
But again this doesn't work for reading the file from Local API.
โ11-24-2021 02:08 AM
No that should work.
I just tested it on my environment.
Also:
https://docs.microsoft.com/en-us/azure/databricks/data/databricks-file-system#python
But maybe you use the community edition of Databricks? If I recall correctly, the dbfs mounting is limited. So the local file interface might not work.
(See https://community.databricks.com/s/question/0D53f00001HKIFjCAP/where-is-dbfs-mounted-with-community-...), not sure though.
If not: all I could think of is that the file is not there (so incorrect path), but SC can find it so that won't be it.
Proof it works:
โ11-24-2021 02:22 AM
hey @Werner Stinckensโ ,
My apologies! Forgot to mention that I'm using the Databricks community edition. Thanks for the references, much appreciated!!
โ12-12-2021 08:00 AM
Thank you for the help @Kaniz Fatmaโ !! Appreciate it. ๐
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now