cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Issue while trying to read a text file in databricks using Local File API's instead of Spark API.

RiyazAli
Contributor III

I'm trying to read a small txt file which is added as a table to the default db on Databricks. While trying to read the file via Local File API, I get a `FileNotFoundError`, but I'm able to read the same file as Spark RDD using SparkContext.

Please find the code below:

with open("/FileStore/tables/boringwords.txt", "r") as f_read:
  for line in f_read:
    print(line)

The error I get is:

FileNotFoundError                         Traceback (most recent call last)
<command-2618449717515592> in <module>
----> 1 with open("dbfs:/FileStore/tables/boringwords.txt", "r") as f_read:
      2   for line in f_read:
      3     print(line)
 
FileNotFoundError: [Errno 2] No such file or directory: 'dbfs:/FileStore/tables/boringwords.txt'

Where as, I have no problem reading the file using SparkContext:

boring_words = sc.textFile("/FileStore/tables/boringwords.txt")
set(i.strip() for i in boring_words.collect())

And as expected, I get the result for the above block of code:

Out[4]: {'mad',
 'mobile',
 'filename',
 'circle',
 'cookies',
 'immigration',
 'anticipated',
 'editorials',
 'review'}

I was also referring to the DBFS documentation to understand the Local File API's limitations but of no lead on the issue. Any help would be greatly appreciated. Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz
Community Manager
Community Manager

Hi @Riyaz Aliโ€‹ ,

In the community edition, in DBR 7+, this mount is disabled.

If you're using the community edition, please run your code in a notebook with DBR version < 7 . It shall definitely work.

Screenshot 2021-11-24 at 7.02.14 PMScreenshot 2021-11-24 at 7.10.57 PM

View solution in original post

9 REPLIES 9

-werners-
Esteemed Contributor III

can you try with /dbfs/Filestore/tables/boringwords.txt?

Hey there @Werner Stinckensโ€‹ ! Thanks for your response!

I've tried your suggestion and I still get the same error!

PFA the snip below:

error_snipMoreover, I've realized that adding ```/dbfs``` to the path is optional, as I've stored the data in the default database. Refer to the OP that I'm creating a RDD by passing the path ```"/FileStore/Tables/filename.txt"``` in `sc.textFile`.

Thank you!

-werners-
Esteemed Contributor III

you forgot a "/" as the first character in your file path.

Hello @Werner Stinckensโ€‹ !

You're right! I missed the '/' earlier.

But, nothing changed after adding the '/' before dbfs. Below is the snip:

imageMoreover, when I tried the same path notation with SparkContext - It threw me an error:

imageI'm starting to wonder if this is the right way to provide the absolute path.

On the contrary, I've gave the path as "dbfs:/FileStore/tables/boringwords.txt" and it worked.

imageBut again this doesn't work for reading the file from Local API.

-werners-
Esteemed Contributor III

No that should work.

I just tested it on my environment.

Also:

https://docs.microsoft.com/en-us/azure/databricks/data/databricks-file-system#python

https://community.databricks.com/s/question/0D53f00001HKHS7CAP/python-open-function-is-unable-to-det...

But maybe you use the community edition of Databricks? If I recall correctly, the dbfs mounting is limited. So the local file interface might not work.

(See https://community.databricks.com/s/question/0D53f00001HKIFjCAP/where-is-dbfs-mounted-with-community-...), not sure though.

If not: all I could think of is that the file is not there (so incorrect path), but SC can find it so that won't be it.

Proof it works:

image

hey @Werner Stinckensโ€‹ ,

My apologies! Forgot to mention that I'm using the Databricks community edition. Thanks for the references, much appreciated!!

Kaniz
Community Manager
Community Manager

Hi @Riyaz Aliโ€‹ ,

In the community edition, in DBR 7+, this mount is disabled.

If you're using the community edition, please run your code in a notebook with DBR version < 7 . It shall definitely work.

Screenshot 2021-11-24 at 7.02.14 PMScreenshot 2021-11-24 at 7.10.57 PM

Thank you for the help @Kaniz Fatmaโ€‹ !! Appreciate it. ๐Ÿ˜€

Kaniz
Community Manager
Community Manager

@Riyaz Aliโ€‹ , I'm happy to know that it helped you ๐Ÿ˜Š.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.