Hi!
i would like to use my own sentence transformer to create a vector index.
It is not a problem using mlflow sentence-transformer flavour, it works fine with:
mlflow.sentence_transformers.log_model(
model,
artifact_path="model",
signature=signature,
input_example=sentences,
registered_model_name=registered_model_name)
model_uri = f"runs:/{run.info.run_id}/model"
registered_model = mlflow.register_model(
model_uri=model_uri,
name=registered_model_name
)
What i want to use is a pyfunc flavour because i want to add a optional preprocessing step as addtional functional that is glued to a model.
Unfortunatly i can't find any documentation or reference on what methods should custom mlflow.pyfunc.PythonModel implement.
i tired something like this:
import mlflow.pyfunc
from pydantic import BaseModel
from sentence_transformers import SentenceTransformer
class MyDataModel(BaseModel):
field1: str
field2: int
field3: float
def process_object(obj: MyDataModel) -> str:
return f"{obj.field1} {obj.field2} {obj.field3}"
class CustomSentenceTransformerModel(mlflow.pyfunc.PythonModel):
def load_context(self, context):
# Load the Sentence Transformer model
self.model = SentenceTransformer('all-MiniLM-L6-v2')
def process_object(self, obj:MyDataModel):
# Define your custom processing here
return f"Processed object with value: {obj}"
def predict(self, context, model_input):
# This method is required for MLflow's pyfunc models
return self.model.encode(model_input)
def encode(self, input):
return self.model.encode(input)
Yet is is not possible to use it for indexing tables.
I know that i can just run a notebook that will create a new column with vector embeddings, but thats not the point here.
I just get error:
Index creation failed
Failed to call Model Serving endpoint: embedding_pyfunc.
Without any justification/logs anything!