cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

PermissionError: [Errno 1] Operation not permitted: '/Volumes/mycatalog'

DanR
New Contributor II

We are having intermittent errors where a Job Task cannot access a Catalog through a Volume, with the error: `PermissionError: [Errno 1] Operation not permitted: '/Volumes/mycatalog'`.The Job has 40 tasks running in parallel and every few runs we experience this error in a different Task. Our workspace is on Azure and is Terraformed.

Stack trace:

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/watermark.py:137, in Watermark.write(self)
    135 if not os.path.exists(self.path_base):
    136     self.logger.debug(f"Creating directory: {self.path_base}")
--> 137     os.makedirs(self.path_base, exist_ok=True)
    139 while current_retry < self.__max_retries and not success:
    140     try:

File /usr/lib/python3.10/os.py:215, in makedirs(name, mode, exist_ok)
    213 if head and tail and not path.exists(head):
    214     try:
--> 215         makedirs(head, exist_ok=exist_ok)
    216     except FileExistsError:
    217         # Defeats race condition when another thread created the path
    218         pass

File /usr/lib/python3.10/os.py:215, in makedirs(name, mode, exist_ok)
    213 if head and tail and not path.exists(head):
    214     try:
--> 215         makedirs(head, exist_ok=exist_ok)
    216     except FileExistsError:
    217         # Defeats race condition when another thread created the path
    218         pass

    [... skipping similar frames: makedirs at line 215 (2 times)]

File /usr/lib/python3.10/os.py:215, in makedirs(name, mode, exist_ok)
    213 if head and tail and not path.exists(head):
    214     try:
--> 215         makedirs(head, exist_ok=exist_ok)
    216     except FileExistsError:
    217         # Defeats race condition when another thread created the path
    218         pass

File /usr/lib/python3.10/os.py:225, in makedirs(name, mode, exist_ok)
    223         return
    224 try:
--> 225     mkdir(name, mode)
    226 except OSError:
    227     # Cannot rely on checking for EEXIST, since the operating system
    228     # could give priority to other errors like EACCES or EROFS
    229     if not exist_ok or not path.isdir(name):

PermissionError: [Errno 1] Operation not permitted: '/Volumes/mycatalog'

 

4 REPLIES 4

Mohamed_Deyab
New Contributor II

We are having a similar issue intermittently in Azure Databricks. I notice it happens when there is parallelism involved. The same wheel when run at non-heavy times, consistently finishes successfully. It only intermittently fail when the load is heavy.

Below is a sample error.

[Errno 13] Permission denied: '/local_disk0/spark-554a306a-e837-4180-a168-f5ee6e92bb75/trustedTemp-a74efb23-77ff-4d8f-b2bf-e96c708a0e96/tmpoz5zp7qn'"}

 

cronosnull
New Contributor II

I've been encountering an issue when trying to create a folder from a Python UDF on a volume. This process usually works, but I've noticed that the likelihood of the error increases with the number of tasks in the same cluster. Is it possible that this issue is time-related? Or could it be some sort of race condition? I'm using os.makedirs with exists_ok set to True, but it's possible that the volumes are generating a permissions error instead of a file exists exception.

szymon_dybczak
Esteemed Contributor III

Hi @cronosnull ,

I'm thinking this is due to race condition. Looking at the stack trace attach in original post, there are multiple task running at the same time and they are trying to write to same path, so once in awhile there could be FileExistsError, that bubbles up as Permission error. You can try add some randomness to your UDF, to lower the chance of collision

NandiniN
Databricks Employee
Databricks Employee

It appears to be a concurrency limitation, and there were fixes in the past but there is a possibility it may be a new code flow, adding a retry to the operation can mitigate the issue and work as a workaround. But you can report the issue with Databricks to handle the edge cases.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group