cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
cancel
Showing results for 
Search instead for 
Did you mean: 

Spacy Retraining failure

AndersenHuang
New Contributor

Hello,

 

I'm having problems trying to run my retraining notebook for a spacy model. The notebook creates a shell file with the following lines of code:

 

 

 

 cmd = f'''
    awk '{{sub("source = ","source = /dbfs/FileStore/{dbfs_folder}/textcat/categories/model_{model_id}/model-best")}}1' config_.cfg > /dbfs/FileStore/{dbfs_folder}/temp/config.cfg
    '''
    
    f.write(cmd)
    
    cmd = f'''
    python -m spacy train /dbfs/FileStore/{dbfs_folder}/temp/config.cfg --output "/dbfs/FileStore/{dbfs_folder}/temp/model2" --paths.train "/dbfs/FileStore/{dbfs_folder}/temp/train_imbalanced.spacy" --paths.dev "/dbfs/FileStore/{dbfs_folder}/temp/eval.spacy"


f.write(cmd)
    
    cmd = f'''
    awk '{{sub("source = ","source = /dbfs/FileStore/{dbfs_folder}/temp/model2/model-best")}}1' config_.cfg > /dbfs/FileStore/{dbfs_folder}/temp/config.cfg
    '''
    
    f.write(cmd)
    
    cmd = f'''
    python -m spacy train /dbfs/FileStore/{dbfs_folder}/temp/config.cfg --output "/dbfs/FileStore/{dbfs_folder}/temp/model2" --paths.train "/dbfs/FileStore/{dbfs_folder}/temp/train_semibalanced.spacy" --paths.dev "/dbfs/FileStore/{dbfs_folder}/temp/eval.spacy"
    '''
    
    f.write(cmd)
    
    cmd = f'''
    python -m spacy train /dbfs/FileStore/{dbfs_folder}/temp/config.cfg --output "/dbfs/FileStore/{dbfs_folder}/temp/model2" --paths.train "/dbfs/FileStore/{dbfs_folder}/temp/train_balanced.spacy" --paths.dev "/dbfs/FileStore/{dbfs_folder}/temp/eval.spacy"
    '''
    
    f.write(cmd)

 

 

Then runs it with %sh /dbfs/FileStore/"$fld"/temp/train.sh

From what I am able to tell, spacy train uses shutil.copytree, which doesn't seems to be working anymore when I try to use it on files stored in dbfs. It returns the error 

 

 

shutil.Error: [('/dbfs/FileStore/Prod/temp/model2/model-last/config.cfg', '/dbfs/FileStore/Prod/temp/model2/model-best/config.cfg', '[Errno 1] Operation not permitted')

 

 

for each file in the tree. This notebook was working the last time we ran it, which was about 10 months ago. Any ideas what could be going wrong?

 

1 REPLY 1

Kumaran
Valued Contributor III
Valued Contributor III

Hi @AndersenHuang,

Thank you for contacting Databricks community support.

The error message you're encountering suggests that there's a permission issue when trying to copy the files. It's possible that the permissions for the directory /dbfs/FileStore/Prod/temp/model2/ have changed, or that there's been an update to the shutil library or the way it interacts with the DBFS file system.

One possible solution is to use the dbutils.fs.cp command instead of shutil.copytree to copy the files. dbutils.fs.cp is a Databricks utility function that can be used to copy files within the DBFS file system. Here's an example of how you could modify your code to use dbutils.fs.cp:

 

 
1dbutils.fs.cp("/dbfs/FileStore/Prod/temp/model2/model-last/config.cfg", "/dbfs/FileStore/Prod/temp/model2/model-best/config.cfg")

Another possible solution is to check the permissions of the directory /dbfs/FileStore/Prod/temp/model2/ and make sure that the user running the notebook has the necessary permissions to read and write to the directory.

It's also possible that there's been an update to the shutil library or the way it interacts with the DBFS file system that's causing the issue. You could try using an older version of the shutil library to see if that resolves the issue.

Finally, it's worth noting that the error message you're encountering is not specific to Spacy, but rather to the shutil library and the way it interacts with the DBFS file system. Therefore, the solution to this issue may not be specific to Spacy either.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.