<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Spacy Retraining failure in Machine Learning</title>
    <link>https://community.databricks.com/t5/machine-learning/spacy-retraining-failure/m-p/67738#M3234</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/103796"&gt;@AndersenHuang&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;Thank you for contacting Databricks community support.&lt;/P&gt;
&lt;P&gt;The error message you're encountering suggests that there's a permission issue when trying to copy the files. It's possible that the permissions for the directory&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="du-bois-light-typography css-v80wf5"&gt;&lt;CODE&gt;/dbfs/FileStore/Prod/temp/model2/&lt;/CODE&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;have changed, or that there's been an update to the&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="du-bois-light-typography css-v80wf5"&gt;&lt;CODE&gt;shutil&lt;/CODE&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;library or the way it interacts with the DBFS file system.&lt;/P&gt;
&lt;P&gt;One possible solution is to use the&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="du-bois-light-typography css-v80wf5"&gt;&lt;CODE&gt;dbutils.fs.cp&lt;/CODE&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;command instead of&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="du-bois-light-typography css-v80wf5"&gt;&lt;CODE&gt;shutil.copytree&lt;/CODE&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;to copy the files.&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="du-bois-light-typography css-v80wf5"&gt;&lt;CODE&gt;dbutils.fs.cp&lt;/CODE&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;is a Databricks utility function that can be used to copy files within the DBFS file system. Here's an example of how you could modify your code to use&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="du-bois-light-typography css-v80wf5"&gt;&lt;CODE&gt;dbutils.fs.cp&lt;/CODE&gt;&lt;/SPAN&gt;:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;DIV class="css-1fz341y"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;PRE&gt;&lt;CODE&gt;&lt;SPAN class="linenumber react-syntax-highlighter-line-number"&gt;1&lt;/SPAN&gt;&lt;SPAN&gt;dbutils.fs.cp("/dbfs/FileStore/Prod/temp/model2/model-last/config.cfg", "/dbfs/FileStore/Prod/temp/model2/model-best/config.cfg")&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;LI-WRAPPER&gt;&lt;/LI-WRAPPER&gt;&lt;/P&gt;
&lt;P&gt;Another possible solution is to check the permissions of the directory&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="du-bois-light-typography css-v80wf5"&gt;&lt;CODE&gt;/dbfs/FileStore/Prod/temp/model2/&lt;/CODE&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;and make sure that the user running the notebook has the necessary permissions to read and write to the directory.&lt;/P&gt;
&lt;P&gt;It's also possible that there's been an update to the&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="du-bois-light-typography css-v80wf5"&gt;&lt;CODE&gt;shutil&lt;/CODE&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;library or the way it interacts with the DBFS file system that's causing the issue. You could try using an older version of the&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="du-bois-light-typography css-v80wf5"&gt;&lt;CODE&gt;shutil&lt;/CODE&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;library to see if that resolves the issue.&lt;/P&gt;
&lt;P&gt;Finally, it's worth noting that the error message you're encountering is not specific to Spacy, but rather to the&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="du-bois-light-typography css-v80wf5"&gt;&lt;CODE&gt;shutil&lt;/CODE&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;library and the way it interacts with the DBFS file system. Therefore, the solution to this issue may not be specific to Spacy either.&lt;/P&gt;</description>
    <pubDate>Tue, 30 Apr 2024 19:05:02 GMT</pubDate>
    <dc:creator>Kumaran</dc:creator>
    <dc:date>2024-04-30T19:05:02Z</dc:date>
    <item>
      <title>Spacy Retraining failure</title>
      <link>https://community.databricks.com/t5/machine-learning/spacy-retraining-failure/m-p/66786#M3213</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm having problems trying to run my retraining notebook for a spacy model. The notebook creates a shell file with the following lines of code:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt; cmd = f'''
    awk '{{sub("source = ","source = /dbfs/FileStore/{dbfs_folder}/textcat/categories/model_{model_id}/model-best")}}1' config_.cfg &amp;gt; /dbfs/FileStore/{dbfs_folder}/temp/config.cfg
    '''
    
    f.write(cmd)
    
    cmd = f'''
    python -m spacy train /dbfs/FileStore/{dbfs_folder}/temp/config.cfg --output "/dbfs/FileStore/{dbfs_folder}/temp/model2" --paths.train "/dbfs/FileStore/{dbfs_folder}/temp/train_imbalanced.spacy" --paths.dev "/dbfs/FileStore/{dbfs_folder}/temp/eval.spacy"


f.write(cmd)
    
    cmd = f'''
    awk '{{sub("source = ","source = /dbfs/FileStore/{dbfs_folder}/temp/model2/model-best")}}1' config_.cfg &amp;gt; /dbfs/FileStore/{dbfs_folder}/temp/config.cfg
    '''
    
    f.write(cmd)
    
    cmd = f'''
    python -m spacy train /dbfs/FileStore/{dbfs_folder}/temp/config.cfg --output "/dbfs/FileStore/{dbfs_folder}/temp/model2" --paths.train "/dbfs/FileStore/{dbfs_folder}/temp/train_semibalanced.spacy" --paths.dev "/dbfs/FileStore/{dbfs_folder}/temp/eval.spacy"
    '''
    
    f.write(cmd)
    
    cmd = f'''
    python -m spacy train /dbfs/FileStore/{dbfs_folder}/temp/config.cfg --output "/dbfs/FileStore/{dbfs_folder}/temp/model2" --paths.train "/dbfs/FileStore/{dbfs_folder}/temp/train_balanced.spacy" --paths.dev "/dbfs/FileStore/{dbfs_folder}/temp/eval.spacy"
    '''
    
    f.write(cmd)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Then runs it with&amp;nbsp;&lt;SPAN&gt;%sh /dbfs/FileStore/&lt;/SPAN&gt;&lt;SPAN&gt;"$fld"&lt;/SPAN&gt;&lt;SPAN&gt;/temp/train.sh&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;From what I am able to tell, spacy train uses shutil.copytree, which doesn't seems to be working anymore when I try to use it on files stored in dbfs. It returns the error&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;shutil.Error: [('/dbfs/FileStore/Prod/temp/model2/model-last/config.cfg', '/dbfs/FileStore/Prod/temp/model2/model-best/config.cfg', '[Errno 1] Operation not permitted')&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;for each file in the tree. This notebook was working the last time we ran it, which was about 10 months ago. Any ideas what could be going wrong?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 19 Apr 2024 17:29:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/spacy-retraining-failure/m-p/66786#M3213</guid>
      <dc:creator>AndersenHuang</dc:creator>
      <dc:date>2024-04-19T17:29:45Z</dc:date>
    </item>
    <item>
      <title>Re: Spacy Retraining failure</title>
      <link>https://community.databricks.com/t5/machine-learning/spacy-retraining-failure/m-p/67738#M3234</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/103796"&gt;@AndersenHuang&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;Thank you for contacting Databricks community support.&lt;/P&gt;
&lt;P&gt;The error message you're encountering suggests that there's a permission issue when trying to copy the files. It's possible that the permissions for the directory&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="du-bois-light-typography css-v80wf5"&gt;&lt;CODE&gt;/dbfs/FileStore/Prod/temp/model2/&lt;/CODE&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;have changed, or that there's been an update to the&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="du-bois-light-typography css-v80wf5"&gt;&lt;CODE&gt;shutil&lt;/CODE&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;library or the way it interacts with the DBFS file system.&lt;/P&gt;
&lt;P&gt;One possible solution is to use the&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="du-bois-light-typography css-v80wf5"&gt;&lt;CODE&gt;dbutils.fs.cp&lt;/CODE&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;command instead of&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="du-bois-light-typography css-v80wf5"&gt;&lt;CODE&gt;shutil.copytree&lt;/CODE&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;to copy the files.&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="du-bois-light-typography css-v80wf5"&gt;&lt;CODE&gt;dbutils.fs.cp&lt;/CODE&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;is a Databricks utility function that can be used to copy files within the DBFS file system. Here's an example of how you could modify your code to use&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="du-bois-light-typography css-v80wf5"&gt;&lt;CODE&gt;dbutils.fs.cp&lt;/CODE&gt;&lt;/SPAN&gt;:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;DIV class="css-1fz341y"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;PRE&gt;&lt;CODE&gt;&lt;SPAN class="linenumber react-syntax-highlighter-line-number"&gt;1&lt;/SPAN&gt;&lt;SPAN&gt;dbutils.fs.cp("/dbfs/FileStore/Prod/temp/model2/model-last/config.cfg", "/dbfs/FileStore/Prod/temp/model2/model-best/config.cfg")&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;LI-WRAPPER&gt;&lt;/LI-WRAPPER&gt;&lt;/P&gt;
&lt;P&gt;Another possible solution is to check the permissions of the directory&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="du-bois-light-typography css-v80wf5"&gt;&lt;CODE&gt;/dbfs/FileStore/Prod/temp/model2/&lt;/CODE&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;and make sure that the user running the notebook has the necessary permissions to read and write to the directory.&lt;/P&gt;
&lt;P&gt;It's also possible that there's been an update to the&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="du-bois-light-typography css-v80wf5"&gt;&lt;CODE&gt;shutil&lt;/CODE&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;library or the way it interacts with the DBFS file system that's causing the issue. You could try using an older version of the&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="du-bois-light-typography css-v80wf5"&gt;&lt;CODE&gt;shutil&lt;/CODE&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;library to see if that resolves the issue.&lt;/P&gt;
&lt;P&gt;Finally, it's worth noting that the error message you're encountering is not specific to Spacy, but rather to the&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="du-bois-light-typography css-v80wf5"&gt;&lt;CODE&gt;shutil&lt;/CODE&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;library and the way it interacts with the DBFS file system. Therefore, the solution to this issue may not be specific to Spacy either.&lt;/P&gt;</description>
      <pubDate>Tue, 30 Apr 2024 19:05:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/spacy-retraining-failure/m-p/67738#M3234</guid>
      <dc:creator>Kumaran</dc:creator>
      <dc:date>2024-04-30T19:05:02Z</dc:date>
    </item>
  </channel>
</rss>

