11-23-2021 10:31 PM
I'm trying to read a small txt file which is added as a table to the default db on Databricks. While trying to read the file via Local File API, I get a `FileNotFoundError`, but I'm able to read the same file as Spark RDD using SparkContext.
Please find the code below:
with open("/FileStore/tables/boringwords.txt", "r") as f_read:
for line in f_read:
print(line)
The error I get is:
FileNotFoundError Traceback (most recent call last)
<command-2618449717515592> in <module>
----> 1 with open("dbfs:/FileStore/tables/boringwords.txt", "r") as f_read:
2 for line in f_read:
3 print(line)
FileNotFoundError: [Errno 2] No such file or directory: 'dbfs:/FileStore/tables/boringwords.txt'
Where as, I have no problem reading the file using SparkContext:
boring_words = sc.textFile("/FileStore/tables/boringwords.txt")
set(i.strip() for i in boring_words.collect())
And as expected, I get the result for the above block of code:
Out[4]: {'mad',
'mobile',
'filename',
'circle',
'cookies',
'immigration',
'anticipated',
'editorials',
'review'}
I was also referring to the DBFS documentation to understand the Local File API's limitations but of no lead on the issue. Any help would be greatly appreciated. Thanks!
11-23-2021 11:49 PM
can you try with /dbfs/Filestore/tables/boringwords.txt?
11-24-2021 12:58 AM
Hey there @Werner Stinckens ! Thanks for your response!
I've tried your suggestion and I still get the same error!
PFA the snip below:
Moreover, I've realized that adding ```/dbfs``` to the path is optional, as I've stored the data in the default database. Refer to the OP that I'm creating a RDD by passing the path ```"/FileStore/Tables/filename.txt"``` in `sc.textFile`.
Thank you!
11-24-2021 01:11 AM
you forgot a "/" as the first character in your file path.
11-24-2021 01:52 AM
Hello @Werner Stinckens !
You're right! I missed the '/' earlier.
But, nothing changed after adding the '/' before dbfs. Below is the snip:
Moreover, when I tried the same path notation with SparkContext - It threw me an error:
I'm starting to wonder if this is the right way to provide the absolute path.
On the contrary, I've gave the path as "dbfs:/FileStore/tables/boringwords.txt" and it worked.
But again this doesn't work for reading the file from Local API.
11-24-2021 02:08 AM
No that should work.
I just tested it on my environment.
Also:
https://docs.microsoft.com/en-us/azure/databricks/data/databricks-file-system#python
But maybe you use the community edition of Databricks? If I recall correctly, the dbfs mounting is limited. So the local file interface might not work.
(See https://community.databricks.com/s/question/0D53f00001HKIFjCAP/where-is-dbfs-mounted-with-community-...), not sure though.
If not: all I could think of is that the file is not there (so incorrect path), but SC can find it so that won't be it.
Proof it works:
11-24-2021 02:22 AM
hey @Werner Stinckens ,
My apologies! Forgot to mention that I'm using the Databricks community edition. Thanks for the references, much appreciated!!
12-12-2021 08:00 AM
Thank you for the help @Kaniz Fatma !! Appreciate it. 😀
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group