Hello. I have a file over 100GB. Sometimes this is on the cluster's local path, and sometimes it's on the volume.
And I want to send this to another path on the volume, or to the s3 bucket.
dbutils.fs.cp('file:///tmp/test.txt', '/Volumes/catalog/schema/path/').
This is my code.
However, when you want to send a file exceeding 100GB,
IllegalArgumentException: partNumber must be between 1 and 10000 inclusive, but is 10001
An error occurs.
According to what I found, dbutils (spark) divides files into blocks of 10mb (10,485,760 bytes) when transferring them to a volume or s3. And an error seems to occur because 100GB requires more than 10,000 10MB blocks.
So I set spark.hadoop.fs.s3a.multipart.size to 104857600.
However, when I run dbutils.fs.cp it still seems to generate blocks of 10mb each.
And the same error occurs again.
This is because 10mb files are continuously created in the '/tmp/hadoop-root/s3a' .
Am I misunderstanding something?
Or a file larger than 100GB cannot be moved with dbutils?