cancel
Showing results for 
Search instead for 
Did you mean: 
Community Platform Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results for 
Search instead for 
Did you mean: 

Permission denied during write

Daan
New Contributor III

Hey everyone,

I have a pipeline that fetches data from s3 and stores them under the Databricks .tmp/ folder.
The pipeline is always able to write around 200 000 files before I get a Permission Denied error. This happens in the following code block: 

os.makedirs(f".tmp/{filename_base[:-4]}", exist_ok=True).
There are no duplicates in filename_base[:-4].

Any idea why that is the case? 
 
Thanks!
1 ACCEPTED SOLUTION

Accepted Solutions

Walter_C
Databricks Employee
Databricks Employee

The "Permission Denied" error you are encountering when using os.makedirs to create directories under the Databricks .tmp/ folder is likely due to concurrency issues or permission restrictions on the .tmp/ directory.

Here are a few potential reasons and solutions:

  1. Concurrency Issues: If multiple tasks are trying to create directories at the same time, it can lead to race conditions. This is supported by the context from the Databricks Community and Slack discussions, where similar issues were observed when there was high parallelism or multiple tasks running concurrently. Adding some randomness to the directory names or implementing a retry mechanism can help mitigate this issue.

  2. Permission Restrictions: The .tmp/ directory might have specific permission settings that prevent the creation of a large number of files or directories. This is suggested by the context from the Databricks Community, where permission errors were encountered when trying to create directories on certain volumes or paths.

  3. Volume-Specific Issues: If you are using a specific S3 bucket or volume, there might be permission issues related to that storage. As seen in the Slack discussion, switching to a different bucket resolved the issue for another user.

To address the issue, you can try the following steps:

  • Add Randomness: Modify your directory creation logic to include some randomness in the directory names to reduce the likelihood of collisions.
  • Implement Retries: Add a retry mechanism to handle transient permission errors.
  • Check Permissions: Ensure that the .tmp/ directory and the underlying storage have the necessary permissions for creating directories.
  • Use Databricks Utilities: Instead of using os.makedirs, you can use dbutils.fs.mkdirs which is designed to work with Databricks file systems and might handle permissions more gracefully.

View solution in original post

4 REPLIES 4

Walter_C
Databricks Employee
Databricks Employee

Can you share the specific error message you are receiving?

Daan
New Contributor III

This is the error message I get: [Errno 13] Permission denied: '.tmp/MeterReadContinuous-d7cc2215-5b75-419c-a843-06e712a94ac8'

Walter_C
Databricks Employee
Databricks Employee

The "Permission Denied" error you are encountering when using os.makedirs to create directories under the Databricks .tmp/ folder is likely due to concurrency issues or permission restrictions on the .tmp/ directory.

Here are a few potential reasons and solutions:

  1. Concurrency Issues: If multiple tasks are trying to create directories at the same time, it can lead to race conditions. This is supported by the context from the Databricks Community and Slack discussions, where similar issues were observed when there was high parallelism or multiple tasks running concurrently. Adding some randomness to the directory names or implementing a retry mechanism can help mitigate this issue.

  2. Permission Restrictions: The .tmp/ directory might have specific permission settings that prevent the creation of a large number of files or directories. This is suggested by the context from the Databricks Community, where permission errors were encountered when trying to create directories on certain volumes or paths.

  3. Volume-Specific Issues: If you are using a specific S3 bucket or volume, there might be permission issues related to that storage. As seen in the Slack discussion, switching to a different bucket resolved the issue for another user.

To address the issue, you can try the following steps:

  • Add Randomness: Modify your directory creation logic to include some randomness in the directory names to reduce the likelihood of collisions.
  • Implement Retries: Add a retry mechanism to handle transient permission errors.
  • Check Permissions: Ensure that the .tmp/ directory and the underlying storage have the necessary permissions for creating directories.
  • Use Databricks Utilities: Instead of using os.makedirs, you can use dbutils.fs.mkdirs which is designed to work with Databricks file systems and might handle permissions more gracefully.

Daan
New Contributor III

Thanks for your reply Walter! The filenames are already unique, retries produce the same result and I have the necessary permission as I was able to write the other 200 000 files (with the same program that is running continuous). 
It does makes sense to use Databricks Utilities however. Will try it out and let you know. 
Thanks!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group