Save file to /tmp
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-10-2024 07:01 AM
Hello, I have python code that collects data in json, and sends it to an S3 bucket, everything works fine. But when there is a lot of data, it causes memory overflow.
So I want to save locally, for example in /tmp or dbfs:/tmp and after sending it to S3, but when saving it says that the directory or file does not exist, as if the file is generated but cannot be found.
If I mount UC Vulumes, then it works.
Are there any restrictions? I'm mounting everything via unity catalog, not via dbfs.
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-18-2024 09:45 AM
Hi @thiagoawstest ,
Thank you for reaching out to our community! We're here to help you.
To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feedback not only helps us assist you better but also benefits other community members who may have similar questions in the future.
If you found the answer helpful, consider giving it a kudo. If the response fully addresses your question, please mark it as the accepted solution. This will help us close the thread and ensure your question is resolved.
We appreciate your participation and are here to assist you further if you need it!
Thanks,
Rishabh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-15-2025 07:10 PM
I am experiencing the same problem. I create a file in /tmp and can verify that it exists. But when an attempt is made to open the file using pyspark, the file is not found. I noticed that the path I used to create the file is /tmp/foobar.parquet and the path being reported as not found is dbfs:/tmp/foobar.parquet.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-16-2025 08:51 AM
I found what my problem was. I used pandas to save my parquet file to /tmp. It stored it in the compute node local file system /tmp folder. When I passed the same path to pyspark to load the file, it prepended 'dbfs:' to the path. The file wasn't in dbfs:/tmp, so the call failed. I prepended 'file:' to the path name that I passed to pyspark and the call succeeded.