Reading from /tmp no longer working
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-14-2022 06:29 AM
Since yesterday, reading a file copied into the cluster is no longer working.
What used to work:
blob = gcs_bucket.get_blob("dev/data.ndjson") -> works
blob.download_to_filename("/tmp/data-copy.ndjson") -> works
df = spark.read.json("/tmp/data-copy.ndjson") -> fails
When calling os.listdir('/tmp'), the file is listed as expected.
This worked yesterday. Has something changed?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-18-2022 08:24 AM
Hi @Sarah Usher , could you please provide the error you are receiving when it is getting failed?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-27-2022 05:01 AM
Hi @Sarah Usher
Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help.
We'd love to hear from you.
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-05-2023 06:55 AM
I encountered this same issue, and figured out a fix!
For some reason, it seems like only %sh cells can access the /tmp directory. So I just did...
%sh ch /tmp/<file> /dbfs/<desired-location> and then accessed it form there using Spark.

