- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
3 weeks ago
Hey everyone,
I have a pipeline that fetches data from s3 and stores them under the Databricks .tmp/ folder.
The pipeline is always able to write around 200 000 files before I get a Permission Denied error. This happens in the following code block:
Any idea why that is the case?
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
3 weeks ago
The "Permission Denied" error you are encountering when using os.makedirs
to create directories under the Databricks .tmp/
folder is likely due to concurrency issues or permission restrictions on the .tmp/
directory.
Here are a few potential reasons and solutions:
-
Concurrency Issues: If multiple tasks are trying to create directories at the same time, it can lead to race conditions. This is supported by the context from the Databricks Community and Slack discussions, where similar issues were observed when there was high parallelism or multiple tasks running concurrently. Adding some randomness to the directory names or implementing a retry mechanism can help mitigate this issue.
-
Permission Restrictions: The
.tmp/
directory might have specific permission settings that prevent the creation of a large number of files or directories. This is suggested by the context from the Databricks Community, where permission errors were encountered when trying to create directories on certain volumes or paths. -
Volume-Specific Issues: If you are using a specific S3 bucket or volume, there might be permission issues related to that storage. As seen in the Slack discussion, switching to a different bucket resolved the issue for another user.
To address the issue, you can try the following steps:
- Add Randomness: Modify your directory creation logic to include some randomness in the directory names to reduce the likelihood of collisions.
- Implement Retries: Add a retry mechanism to handle transient permission errors.
- Check Permissions: Ensure that the
.tmp/
directory and the underlying storage have the necessary permissions for creating directories. - Use Databricks Utilities: Instead of using
os.makedirs
, you can usedbutils.fs.mkdirs
which is designed to work with Databricks file systems and might handle permissions more gracefully.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
3 weeks ago
Can you share the specific error message you are receiving?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
3 weeks ago
This is the error message I get: [Errno 13] Permission denied: '.tmp/MeterReadContinuous-d7cc2215-5b75-419c-a843-06e712a94ac8'
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
3 weeks ago
The "Permission Denied" error you are encountering when using os.makedirs
to create directories under the Databricks .tmp/
folder is likely due to concurrency issues or permission restrictions on the .tmp/
directory.
Here are a few potential reasons and solutions:
-
Concurrency Issues: If multiple tasks are trying to create directories at the same time, it can lead to race conditions. This is supported by the context from the Databricks Community and Slack discussions, where similar issues were observed when there was high parallelism or multiple tasks running concurrently. Adding some randomness to the directory names or implementing a retry mechanism can help mitigate this issue.
-
Permission Restrictions: The
.tmp/
directory might have specific permission settings that prevent the creation of a large number of files or directories. This is suggested by the context from the Databricks Community, where permission errors were encountered when trying to create directories on certain volumes or paths. -
Volume-Specific Issues: If you are using a specific S3 bucket or volume, there might be permission issues related to that storage. As seen in the Slack discussion, switching to a different bucket resolved the issue for another user.
To address the issue, you can try the following steps:
- Add Randomness: Modify your directory creation logic to include some randomness in the directory names to reduce the likelihood of collisions.
- Implement Retries: Add a retry mechanism to handle transient permission errors.
- Check Permissions: Ensure that the
.tmp/
directory and the underlying storage have the necessary permissions for creating directories. - Use Databricks Utilities: Instead of using
os.makedirs
, you can usedbutils.fs.mkdirs
which is designed to work with Databricks file systems and might handle permissions more gracefully.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
3 weeks ago
Thanks for your reply Walter! The filenames are already unique, retries produce the same result and I have the necessary permission as I was able to write the other 200 000 files (with the same program that is running continuous).
It does makes sense to use Databricks Utilities however. Will try it out and let you know.
Thanks!
![](/skins/images/8C2A30E5B696B676846234E4B14F2C7B/responsive_peak/images/icon_anonymous_message.png)
![](/skins/images/8C2A30E5B696B676846234E4B14F2C7B/responsive_peak/images/icon_anonymous_message.png)