cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

S3 sync from bucket to a mounted bucket causing a "[Errno 95] Operation not supported" error for some but not all files

matt_t
New Contributor

Trying to sync one folder from an external s3 bucket to a folder on a mounted S3 bucket and running some simple code on databricks to accomplish this. Data is a bunch of CSVs and PSVs.

The only problem is some of the files are giving this error that the operation is not supported. Now from doing some digging it seems that other folks have gotten this error and it seems to be related to using the mount. Some posts reference how you can't do random writes or appends on the mount (https://kb.databricks.com/dbfs/errno95-operation-not-supported.html). A lot of the files were successfully transferred, but now there are roughly 15 left untransferred even though files with similar formats and similar sizes went through. This was also largely deterministic in the sense that re-running it gave me the same list of files that were problematic.

I know a possible solution can be to copy these failed files first to my cluster in temp folder and then do another s3 copy to the target bucket.

However, the heart of my question is I am wondering WHY these files are files are not supported (because I don't think it's an append issue like that problem I linked before)? Any ideas? --- Side note we are also confused because we do this operation quite often in our org, that is copying from a bucket to a mounted bucket, so what makes this different?

Thank you for your time!

1 ACCEPTED SOLUTION

Accepted Solutions

AmanSehgal
Honored Contributor III

@Matthew Tribbyโ€‹ can you try following:

Copy the problematic files to a separate bucket and just transfer those files. See if the error persists.

If it does, then probably the files have some issues.

The issue could be around file size as the docs states that `This works for small files, but quickly becomes an issue as file size increases.` Is it possible for you to split up the files or maybe try to increase resources on the cluster.

On a side note: Why can't you use `aws s3 sync s3://mybucket s3://mybucket2` ?

View solution in original post

2 REPLIES 2

AmanSehgal
Honored Contributor III

@Matthew Tribbyโ€‹ can you try following:

Copy the problematic files to a separate bucket and just transfer those files. See if the error persists.

If it does, then probably the files have some issues.

The issue could be around file size as the docs states that `This works for small files, but quickly becomes an issue as file size increases.` Is it possible for you to split up the files or maybe try to increase resources on the cluster.

On a side note: Why can't you use `aws s3 sync s3://mybucket s3://mybucket2` ?

Atanu
Databricks Employee
Databricks Employee

@Matthew Tribbyโ€‹  does above suggestion work. Please let us know if you need further help on this. Thanks.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group