We are doing DBFS migration. In that we have a folder 'user' in Root DBFS having data 5.8 TB in legacy workspace. We performed AWS CLi Sync/cp between Legacy to Target and again performed the same between Target bucket to Target dbfs
While implementing this technique we migrated the folders that were in /mnt and /dbfs-root to target root bucket. While migrating the /dbfs-root (user, FileStore, home) we encountered a problem it seems to be very slow while moving /dbfs/user
/user - 5.8TB
/home - 680 GB
/FileStore - 181 GB
Note - This is only slow while performing the migration from Target S3 bucket to /dbfs/user
Status Update on /dbfs/user till now:
Data Migration Status - 750 GB / 5.8 TB
Completion Rate ~12.9 %
Data transfer by AWS sync till now : ~403 GB
We are pretty curious as it is only happening for the user and it tends to be very slow. Around 200 GB a Day. But this was not the scenario for /home and /FileStore.
Please suggest best practices to mount /user folder to target workspace when looking at this data.
Methods already used:
- dbutils.fs.cp()
- aws s3 sync
- aws s3 cp