Databricks Community

Ulfzerk · ‎03-13-2025

Hi!

i would like to use my own sentence transformer to create a vector index.

It is not a problem using mlflow sentence-transformer flavour, it works fine with:

mlflow.sentence_transformers.log_model(
    model,
    artifact_path="model",
    signature=signature,
    input_example=sentences,
    registered_model_name=registered_model_name)
  
  model_uri = f"runs:/{run.info.run_id}/model"
  registered_model = mlflow.register_model(
        model_uri=model_uri,
        name=registered_model_name
    )

What i want to use is a pyfunc flavour because i want to add a optional preprocessing step as addtional functional that is glued to a model.

Unfortunatly i can't find any documentation or reference on what methods should custom mlflow.pyfunc.PythonModel implement.
i tired something like this:

import mlflow.pyfunc
from pydantic import BaseModel
from sentence_transformers import SentenceTransformer
class MyDataModel(BaseModel):
    field1: str
    field2: int
    field3: float

def process_object(obj: MyDataModel) -> str:
    return f"{obj.field1} {obj.field2} {obj.field3}"

class CustomSentenceTransformerModel(mlflow.pyfunc.PythonModel):

    def load_context(self, context):
        # Load the Sentence Transformer model
        self.model = SentenceTransformer('all-MiniLM-L6-v2')

    def process_object(self, obj:MyDataModel):
        # Define your custom processing here
        return f"Processed object with value: {obj}"

    def predict(self, context, model_input):
        # This method is required for MLflow's pyfunc models
        return self.model.encode(model_input)
    
    def encode(self, input):
        return self.model.encode(input)

Yet is is not possible to use it for indexing tables.
I know that i can just run a notebook that will create a new column with vector embeddings, but thats not the point here.

I just get error:

Index creation failed
Failed to call Model Serving endpoint: embedding_pyfunc.

Without any justification/logs anything!

Databricks Community

Custom sentence transformer for indexing

Join Us as a Local Community Builder!

Solution Accelerator Series | #5 - Automating Product Review Summarization with LLMs

The next BrickTalks about the latest and greatest in AI/BI is scheduled for Oct 28!

🚀 Weekly Delta (8 - 14 October): A Look Back at This Week’s Top Community Highlights

BrickCon 2025 — Dec 3–5 | A Community Conference for Databricks Builders

🌟 Community Sparks of the Week | September 26 – October 2 🌟