12-13-2023 12:24 AM
We are having intermittent errors where a Job Task cannot access a Catalog through a Volume, with the error: `PermissionError: [Errno 1] Operation not permitted: '/Volumes/mycatalog'`.The Job has 40 tasks running in parallel and every few runs we experience this error in a different Task. Our workspace is on Azure and is Terraformed.
Stack trace:
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/watermark.py:137, in Watermark.write(self)
135 if not os.path.exists(self.path_base):
136 self.logger.debug(f"Creating directory: {self.path_base}")
--> 137 os.makedirs(self.path_base, exist_ok=True)
139 while current_retry < self.__max_retries and not success:
140 try:
File /usr/lib/python3.10/os.py:215, in makedirs(name, mode, exist_ok)
213 if head and tail and not path.exists(head):
214 try:
--> 215 makedirs(head, exist_ok=exist_ok)
216 except FileExistsError:
217 # Defeats race condition when another thread created the path
218 pass
File /usr/lib/python3.10/os.py:215, in makedirs(name, mode, exist_ok)
213 if head and tail and not path.exists(head):
214 try:
--> 215 makedirs(head, exist_ok=exist_ok)
216 except FileExistsError:
217 # Defeats race condition when another thread created the path
218 pass
[... skipping similar frames: makedirs at line 215 (2 times)]
File /usr/lib/python3.10/os.py:215, in makedirs(name, mode, exist_ok)
213 if head and tail and not path.exists(head):
214 try:
--> 215 makedirs(head, exist_ok=exist_ok)
216 except FileExistsError:
217 # Defeats race condition when another thread created the path
218 pass
File /usr/lib/python3.10/os.py:225, in makedirs(name, mode, exist_ok)
223 return
224 try:
--> 225 mkdir(name, mode)
226 except OSError:
227 # Cannot rely on checking for EEXIST, since the operating system
228 # could give priority to other errors like EACCES or EROFS
229 if not exist_ok or not path.isdir(name):
PermissionError: [Errno 1] Operation not permitted: '/Volumes/mycatalog'
12-14-2023 10:23 PM - edited 12-14-2023 10:23 PM
Hi @DanR, The error message PermissionError: [Errno 1] Operation not permitted: '/Volumes/mycatalog' indicates that the Python process does not have the necessary permissions to create directories or perform certain operations on the specified path. This could be due to the operating system’s security settings or the user permissions of the process.
Here are a few potential solutions:
I hope this helps! Let me know if you have any other questions.
03-21-2024 04:27 PM
We are having a similar issue intermittently in Azure Databricks. I notice it happens when there is parallelism involved. The same wheel when run at non-heavy times, consistently finishes successfully. It only intermittently fail when the load is heavy.
Below is a sample error.
[Errno 13] Permission denied: '/local_disk0/spark-554a306a-e837-4180-a168-f5ee6e92bb75/trustedTemp-a74efb23-77ff-4d8f-b2bf-e96c708a0e96/tmpoz5zp7qn'"}
08-05-2024 12:59 PM
I've been encountering an issue when trying to create a folder from a Python UDF on a volume. This process usually works, but I've noticed that the likelihood of the error increases with the number of tasks in the same cluster. Is it possible that this issue is time-related? Or could it be some sort of race condition? I'm using os.makedirs with exists_ok set to True, but it's possible that the volumes are generating a permissions error instead of a file exists exception.
08-05-2024 03:45 PM
Hi @cronosnull ,
I'm thinking this is due to race condition. Looking at the stack trace attach in original post, there are multiple task running at the same time and they are trying to write to same path, so once in awhile there could be FileExistsError, that bubbles up as Permission error. You can try add some randomness to your UDF, to lower the chance of collision
08-05-2024 06:32 PM
It appears to be a concurrency limitation, and there were fixes in the past but there is a possibility it may be a new code flow, adding a retry to the operation can mitigate the issue and work as a workaround. But you can report the issue with Databricks to handle the edge cases.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group