cancel
Showing results for 
Search instead for 
Did you mean: 
Generative AI
Explore discussions on generative artificial intelligence techniques and applications within the Databricks Community. Share ideas, challenges, and breakthroughs in this cutting-edge field.
cancel
Showing results for 
Search instead for 
Did you mean: 

Custom sentence transformer for indexing

Ulfzerk
New Contributor

Hi! 

i would like to use my own sentence transformer to create a vector index. 

It is not a problem using mlflow sentence-transformer flavour, it works fine with: 

mlflow.sentence_transformers.log_model(
    model,
    artifact_path="model",
    signature=signature,
    input_example=sentences,
    registered_model_name=registered_model_name)
  
  model_uri = f"runs:/{run.info.run_id}/model"
  registered_model = mlflow.register_model(
        model_uri=model_uri,
        name=registered_model_name
    )

What i want to use is a pyfunc flavour because i want to add a optional preprocessing step as addtional functional that is glued to a model. 

Unfortunatly i can't find any documentation or reference on what methods should custom mlflow.pyfunc.PythonModel implement. 
i tired something like this: 

import mlflow.pyfunc
from pydantic import BaseModel
from sentence_transformers import SentenceTransformer
class MyDataModel(BaseModel):
    field1: str
    field2: int
    field3: float

def process_object(obj: MyDataModel) -> str:
    return f"{obj.field1} {obj.field2} {obj.field3}"

class CustomSentenceTransformerModel(mlflow.pyfunc.PythonModel):

    def load_context(self, context):
        # Load the Sentence Transformer model
        self.model = SentenceTransformer('all-MiniLM-L6-v2')

    def process_object(self, obj:MyDataModel):
        # Define your custom processing here
        return f"Processed object with value: {obj}"

    def predict(self, context, model_input):
        # This method is required for MLflow's pyfunc models
        return self.model.encode(model_input)
    
    def encode(self, input):
        return self.model.encode(input)

Yet is is not possible to use it for indexing tables. 
I know that i can just run a notebook that will create a new column with vector embeddings, but thats not the point here. 

I just get error: 

Index creation failed
Failed to call Model Serving endpoint: embedding_pyfunc.

Without any justification/logs anything!

 

0 REPLIES 0

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now