cancel
Showing results for 
Search instead for 
Did you mean: 
Generative AI
Explore discussions on generative artificial intelligence techniques and applications within the Databricks Community. Share ideas, challenges, and breakthroughs in this cutting-edge field.
cancel
Showing results for 
Search instead for 
Did you mean: 

Load the HF pipeline in databricks

Mahsa
New Contributor

Hi all, 

I have a question about the integration of HF in Databricks.

I'm struggling to save the models and datasets:

For instance, for the code below, I got this error:ValueError: Could not load model nickwong64/bert-base-uncased-poems-sentiment with any of the following classes: (<class 'transformers.models.auto.modeling_auto.AutoModelForSequenceClassification'>, <class 'transformers.models.bert.modeling_bert.BertForSequenceClassification'>). See the original errors:
Does anyone know how I can solve this issue?

from transformers import pipeline

sentiment_classifier = pipeline(
    task="text-classification",
    model="nickwong64/bert-base-uncased-poems-sentiment",
    model_kwargs={'cache_dir': '/Volumes/dsa_development/belgium_data/model_dir/hf_cache'}
)

 

2 REPLIES 2

dkushari
Databricks Employee
Databricks Employee

Hi @Mahsa, can you use the local disk as a cache instead of a volume? It should work. Please see below

%pip install -U "transformers==4.44.2" "huggingface_hub>=0.20.0" accelerate datasets evaluate torch safetensors
dbutils.library.restartPython()


from transformers import pipeline

sentiment_classifier = pipeline(
    task="text-classification",
    model="nickwong64/bert-base-uncased-poems-sentiment",
    trust_remote_code=True,
    model_kwargs={'cache_dir': '/local_disk0/tmp/hf_cache'}
)

 

Thompson2345
New Contributor II

 

The error happens because the model "nickwong64/bert-base-uncased-poems-sentiment" isn’t correctly registered as a SequenceClassification model in Hugging Face. You can try:

  1. Use AutoModelForSequenceClassification explicitly:

     

     
    from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline model = AutoModelForSequenceClassification.from_pretrained( "nickwong64/bert-base-uncased-poems-sentiment", cache_dir="/Volumes/dsa_development/belgium_data/model_dir/hf_cache" ) tokenizer = AutoTokenizer.from_pretrained( "nickwong64/bert-base-uncased-poems-sentiment", cache_dir="/Volumes/dsa_development/belgium_data/model_dir/hf_cache" ) sentiment_classifier = pipeline( "text-classification", model=model, tokenizer=tokenizer )
     
    1. Check model card: Make sure the model actually supports "text-classification"/SequenceClassification. Some HF models are only trained as AutoModel and need a wrapper for classification.

    2. Environment path: Ensure Databricks can access the specified cache_dir and it’s mounted correctly.

      This approach explicitly loads the model and tokenizer and usually resolves the “Could not load model” issue in Databricks.

    3.