Query on DBFS migration
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-23-2022 09:36 AM
We are doing DBFS migration. In that we have a folder 'user' in Root DBFS having data 5.8 TB in legacy workspace. We performed AWS CLi Sync/cp between Legacy to Target and again performed the same between Target bucket to Target dbfs
While implementing this technique we migrated the folders that were in /mnt and /dbfs-root to target root bucket. While migrating the /dbfs-root (user, FileStore, home) we encountered a problem it seems to be very slow while moving /dbfs/user
/user - 5.8TB
/home - 680 GB
/FileStore - 181 GB
Note - This is only slow while performing the migration from Target S3 bucket to /dbfs/user
Status Update on /dbfs/user till now:
Data Migration Status - 750 GB / 5.8 TB
Completion Rate ~12.9 %
Data transfer by AWS sync till now : ~403 GB
We are pretty curious as it is only happening for the user and it tends to be very slow. Around 200 GB a Day. But this was not the scenario for /home and /FileStore.
Please suggest best practices to mount /user folder to target workspace when looking at this data.
Methods already used:
- dbutils.fs.cp()
- aws s3 sync
- aws s3 cp
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-23-2022 10:40 AM
dbutils.fs.cp() and other dbutils commands will be slow as they use single core only.
Consider using AWS data sync shorturl.at/FNQTV
My blog: https://databrickster.medium.com/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-24-2022 05:43 AM
Thanks for the quick response.
Regarding the suggested AWS data sync approach, we have tried data sync in multiple ways, it is creating folders in s3 bucket itself not on DBFS. As our task is to copy from bucket to DBFS.
It seems that it only supports bucket level operations not DBFS level.
Please suggest any best practices/approach which can cater our needs. That'll be a great help. Thanks.