cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

S3 sync from bucket to a mounted bucket causing a "[Errno 95] Operation not supported" error for some but not all files

matt_t
New Contributor

Trying to sync one folder from an external s3 bucket to a folder on a mounted S3 bucket and running some simple code on databricks to accomplish this. Data is a bunch of CSVs and PSVs.

The only problem is some of the files are giving this error that the operation is not supported. Now from doing some digging it seems that other folks have gotten this error and it seems to be related to using the mount. Some posts reference how you can't do random writes or appends on the mount (https://kb.databricks.com/dbfs/errno95-operation-not-supported.html). A lot of the files were successfully transferred, but now there are roughly 15 left untransferred even though files with similar formats and similar sizes went through. This was also largely deterministic in the sense that re-running it gave me the same list of files that were problematic.

I know a possible solution can be to copy these failed files first to my cluster in temp folder and then do another s3 copy to the target bucket.

However, the heart of my question is I am wondering WHY these files are files are not supported (because I don't think it's an append issue like that problem I linked before)? Any ideas? --- Side note we are also confused because we do this operation quite often in our org, that is copying from a bucket to a mounted bucket, so what makes this different?

Thank you for your time!

1 ACCEPTED SOLUTION

Accepted Solutions

AmanSehgal
Honored Contributor III

@Matthew Tribby​ can you try following:

Copy the problematic files to a separate bucket and just transfer those files. See if the error persists.

If it does, then probably the files have some issues.

The issue could be around file size as the docs states that `This works for small files, but quickly becomes an issue as file size increases.` Is it possible for you to split up the files or maybe try to increase resources on the cluster.

On a side note: Why can't you use `aws s3 sync s3://mybucket s3://mybucket2` ?

View solution in original post

3 REPLIES 3

Kaniz
Community Manager
Community Manager

Hi @Matthew Tribby​ ! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else I will get back to you soon. Thanks.

AmanSehgal
Honored Contributor III

@Matthew Tribby​ can you try following:

Copy the problematic files to a separate bucket and just transfer those files. See if the error persists.

If it does, then probably the files have some issues.

The issue could be around file size as the docs states that `This works for small files, but quickly becomes an issue as file size increases.` Is it possible for you to split up the files or maybe try to increase resources on the cluster.

On a side note: Why can't you use `aws s3 sync s3://mybucket s3://mybucket2` ?

Atanu
Esteemed Contributor
Esteemed Contributor

@Matthew Tribby​  does above suggestion work. Please let us know if you need further help on this. Thanks.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.