Databricks Community

matt_t · ‎02-17-2022

Trying to sync one folder from an external s3 bucket to a folder on a mounted S3 bucket and running some simple code on databricks to accomplish this. Data is a bunch of CSVs and PSVs.

The only problem is some of the files are giving this error that the operation is not supported. Now from doing some digging it seems that other folks have gotten this error and it seems to be related to using the mount. Some posts reference how you can't do random writes or appends on the mount (https://kb.databricks.com/dbfs/errno95-operation-not-supported.html). A lot of the files were successfully transferred, but now there are roughly 15 left untransferred even though files with similar formats and similar sizes went through. This was also largely deterministic in the sense that re-running it gave me the same list of files that were problematic.

I know a possible solution can be to copy these failed files first to my cluster in temp folder and then do another s3 copy to the target bucket.

However, the heart of my question is I am wondering WHY these files are files are not supported (because I don't think it's an append issue like that problem I linked before)? Any ideas? --- Side note we are also confused because we do this operation quite often in our org, that is copying from a bucket to a mounted bucket, so what makes this different?

Thank you for your time!

AmanSehgal · ‎02-19-2022

@Matthew Tribby can you try following:

Copy the problematic files to a separate bucket and just transfer those files. See if the error persists.

If it does, then probably the files have some issues.

The issue could be around file size as the docs states that `This works for small files, but quickly becomes an issue as file size increases.` Is it possible for you to split up the files or maybe try to increase resources on the cluster.

On a side note: Why can't you use `aws s3 sync s3://mybucket s3://mybucket2` ?

View solution in original post

AmanSehgal · ‎02-19-2022

@Matthew Tribby can you try following:

Copy the problematic files to a separate bucket and just transfer those files. See if the error persists.

If it does, then probably the files have some issues.

The issue could be around file size as the docs states that `This works for small files, but quickly becomes an issue as file size increases.` Is it possible for you to split up the files or maybe try to increase resources on the cluster.

On a side note: Why can't you use `aws s3 sync s3://mybucket s3://mybucket2` ?