cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

su
by New Contributor
  • 4169 Views
  • 3 replies
  • 0 kudos

Reading from /tmp no longer working

Since yesterday, reading a file copied into the cluster is no longer working.What used to work:blob = gcs_bucket.get_blob("dev/data.ndjson") -> worksblob.download_to_filename("/tmp/data-copy.ndjson") -> worksdf = spark.read.json("/tmp/data-copy.ndjso...

  • 4169 Views
  • 3 replies
  • 0 kudos
Latest Reply
Evan_From_Bosto
New Contributor II
  • 0 kudos

I encountered this same issue, and figured out a fix!For some reason, it seems like only %sh cells can access the /tmp directory. So I just did...%sh ch /tmp/<file> /dbfs/<desired-location> and then accessed it form there using Spark.

  • 0 kudos
2 More Replies
THIAM_HUATTAN
by Valued Contributor
  • 4058 Views
  • 3 replies
  • 3 kudos

Using R, how do we write csv file to say dbfs:/tmp?

let us say I already have the data 'TotalData'write.csv(TotalData,file='/tmp/TotalData.csv',row.names = FALSE)I do not see any error from abovewhen I list files below:%fs ls /tmpI do not see any files written there. Why?

  • 4058 Views
  • 3 replies
  • 3 kudos
Latest Reply
Cedric
Databricks Employee
  • 3 kudos

Hi Thiam,Thank you for reaching out to us. In this case it seems that you have written a file to the OS /tmp and tried to fetch the same folder in DBFS.Written >> /tmp/TotalData.csvReading >> /dbfs/tmp/TotalData.csvPlease try to execute write.csv wit...

  • 3 kudos
2 More Replies
j02424
by New Contributor
  • 3488 Views
  • 1 replies
  • 4 kudos

Best practice to delete /dbfs/tmp ?

What is best practice regarding the tmp folder? We have a very large amount of data in that folder and not sure whether to delete, back up etc?

  • 3488 Views
  • 1 replies
  • 4 kudos
Latest Reply
Debayan
Databricks Employee
  • 4 kudos

/dbfs/tmp can contain a lot of files including temporary system files used for intermediary calculations or other sub directories which can contain packages of user defined installations. It is always better to backup the files.

  • 4 kudos
Direo
by Contributor II
  • 12870 Views
  • 2 replies
  • 3 kudos
  • 12870 Views
  • 2 replies
  • 3 kudos
Latest Reply
User16873043212
New Contributor III
  • 3 kudos

@Direo Direo​ , Yeah, this is a location inside your dbfs. The whole control is on you. Databricks do not delete something you keep in this location.

  • 3 kudos
1 More Replies
hoopla
by New Contributor II
  • 7285 Views
  • 2 replies
  • 1 kudos

Unable to copy mutiple files from file:/tmp to dbfs:/tmp

I am downloading multiple files by web scraping and by default they are stored in /tmp I can copy a single file by providing the filename and path %fs cp file:/tmp/2020-12-14_listings.csv.gz dbfs:/tmp but when I try to copy multiple files I get an ...

  • 7285 Views
  • 2 replies
  • 1 kudos
Latest Reply
hoopla
New Contributor II
  • 1 kudos

Thanks DeepakThis is what I have suspected.Hopefully the wild card feature might be available in futureThanks

  • 1 kudos
1 More Replies
Labels