<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Query on DBFS migration in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/query-on-dbfs-migration/m-p/33758#M24696</link>
    <description>&lt;P&gt;Thanks for the quick response.&lt;/P&gt;&lt;P&gt;Regarding the suggested &lt;B&gt;AWS data sync&lt;/B&gt;&amp;nbsp;&lt;A href="https://shorturl.at/FNQTV" alt="https://shorturl.at/FNQTV" target="_blank"&gt;a&lt;/A&gt;pproach, we have tried &lt;B&gt;data sync&lt;/B&gt; in multiple ways, it is creating folders in s3 bucket itself&lt;B&gt;&amp;nbsp;not on DBFS. &lt;/B&gt;As our task is to copy from bucket to DBFS.&lt;/P&gt;&lt;P&gt;It seems that it only &lt;B&gt;supports bucket level operations not DBFS level.&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Please suggest any best practices/approach which can cater our needs. That'll be a great help. Thanks.&lt;/P&gt;</description>
    <pubDate>Wed, 24 Aug 2022 12:43:27 GMT</pubDate>
    <dc:creator>Harsh1</dc:creator>
    <dc:date>2022-08-24T12:43:27Z</dc:date>
    <item>
      <title>Query on DBFS migration</title>
      <link>https://community.databricks.com/t5/data-engineering/query-on-dbfs-migration/m-p/33756#M24694</link>
      <description>&lt;P&gt;We are doing DBFS migration. In that we have a folder '&lt;B&gt;user&lt;/B&gt;' in Root DBFS having data &lt;B&gt;5.8 TB &lt;/B&gt;in legacy workspace. We performed&amp;nbsp;&lt;B&gt;AWS CLi Sync/cp&lt;/B&gt;&amp;nbsp;between&amp;nbsp;&lt;B&gt;Legacy to Target&amp;nbsp;&lt;/B&gt;and again performed the same between&amp;nbsp;&lt;B&gt;Target bucket to Target dbfs&lt;/B&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;While implementing this technique we migrated the folders that were in /mnt and /dbfs-root to target root bucket. While migrating the /dbfs-root (user, FileStore, home) we encountered a problem it seems to be very slow while moving&amp;nbsp;&lt;B&gt;/dbfs/user&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;/user - 5.8TB&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;/home - 680 GB&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;/FileStore - 181 GB&amp;nbsp;&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;Note -&amp;nbsp;&lt;/B&gt;This is only slow while performing the migration from&amp;nbsp;&lt;B&gt;Target S3&lt;/B&gt;&amp;nbsp;bucket to&amp;nbsp;&lt;B&gt;/dbfs/user&lt;/B&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Status Update on /dbfs/user till now:&lt;/P&gt;&lt;P&gt;Data Migration Status - 750 GB / 5.8 TB&lt;/P&gt;&lt;P&gt;Completion Rate ~12.9 %&lt;/P&gt;&lt;P&gt;Data transfer by AWS sync till now : ~403 GB&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;We are pretty curious as it is only happening for the user and it tends to be very slow. Around 200 GB a Day. But this was not the scenario for /home and /FileStore.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Please suggest best practices to mount /user folder to target workspace when looking at this data.&lt;/P&gt;&lt;P&gt;Methods already used:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;dbutils.fs.cp()&lt;/LI&gt;&lt;LI&gt;aws s3 sync&lt;/LI&gt;&lt;LI&gt;aws s3 cp&lt;/LI&gt;&lt;/OL&gt;</description>
      <pubDate>Tue, 23 Aug 2022 16:36:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/query-on-dbfs-migration/m-p/33756#M24694</guid>
      <dc:creator>Harsh1</dc:creator>
      <dc:date>2022-08-23T16:36:02Z</dc:date>
    </item>
    <item>
      <title>Re: Query on DBFS migration</title>
      <link>https://community.databricks.com/t5/data-engineering/query-on-dbfs-migration/m-p/33757#M24695</link>
      <description>&lt;P&gt;dbutils.fs.cp() and other dbutils commands will be slow as they use single core only.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Consider using AWS data sync shorturl.at/FNQTV&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 23 Aug 2022 17:40:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/query-on-dbfs-migration/m-p/33757#M24695</guid>
      <dc:creator>Hubert-Dudek</dc:creator>
      <dc:date>2022-08-23T17:40:08Z</dc:date>
    </item>
    <item>
      <title>Re: Query on DBFS migration</title>
      <link>https://community.databricks.com/t5/data-engineering/query-on-dbfs-migration/m-p/33758#M24696</link>
      <description>&lt;P&gt;Thanks for the quick response.&lt;/P&gt;&lt;P&gt;Regarding the suggested &lt;B&gt;AWS data sync&lt;/B&gt;&amp;nbsp;&lt;A href="https://shorturl.at/FNQTV" alt="https://shorturl.at/FNQTV" target="_blank"&gt;a&lt;/A&gt;pproach, we have tried &lt;B&gt;data sync&lt;/B&gt; in multiple ways, it is creating folders in s3 bucket itself&lt;B&gt;&amp;nbsp;not on DBFS. &lt;/B&gt;As our task is to copy from bucket to DBFS.&lt;/P&gt;&lt;P&gt;It seems that it only &lt;B&gt;supports bucket level operations not DBFS level.&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Please suggest any best practices/approach which can cater our needs. That'll be a great help. Thanks.&lt;/P&gt;</description>
      <pubDate>Wed, 24 Aug 2022 12:43:27 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/query-on-dbfs-migration/m-p/33758#M24696</guid>
      <dc:creator>Harsh1</dc:creator>
      <dc:date>2022-08-24T12:43:27Z</dc:date>
    </item>
  </channel>
</rss>

