cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Reading from /tmp no longer working

su
New Contributor

Since yesterday, reading a file copied into the cluster is no longer working.

What used to work:

blob = gcs_bucket.get_blob("dev/data.ndjson") -> works

blob.download_to_filename("/tmp/data-copy.ndjson") -> works

df = spark.read.json("/tmp/data-copy.ndjson") -> fails

When calling os.listdir('/tmp'), the file is listed as expected.

This worked yesterday. Has something changed?

3 REPLIES 3

Debayan
Esteemed Contributor III
Esteemed Contributor III

Hi @Sarah Usher​ , could you please provide the error you are receiving when it is getting failed?

Anonymous
Not applicable

Hi @Sarah Usher​ 

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. 

We'd love to hear from you.

Thanks!

Evan_From_Bosto
New Contributor II

I encountered this same issue, and figured out a fix!

For some reason, it seems like only %sh cells can access the /tmp directory. So I just did...

%sh ch /tmp/<file> /dbfs/<desired-location> and then accessed it form there using Spark.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.