cancel
Showing results for 
Search instead for 
Did you mean: 
Generative AI
Explore discussions on generative artificial intelligence techniques and applications within the Databricks Community. Share ideas, challenges, and breakthroughs in this cutting-edge field.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to enable word_timestamps=True when querying a Whisper model deployed in Databricks?

JoaoPigozzo
New Contributor II

I’ve deployed OpenAI’s Whisper model as a serving endpoint in Databricks and I’m trying to transcribe an audio file.

import whisper
 
model = whisper.load_model("small")
transcript = model.transcribe(
    word_timestamps=True,
    audio="path/to/audio"
)

In Databricks, I query the model like this:

response = workspace_client.serving_endpoints.query(
    name="whisperv3",
    inputs=[base64_audio_chunks[first_audio_key]]
)
print(response.predictions[0])

My question is: How can I enable word_timestamps=True when querying the model in Databricks?
Is there a way to pass this option in the inputs or another parameter? 

4 REPLIES 4

lingareddy_Alva
Honored Contributor II

Hi @JoaoPigozzo 

To enable word_timestamps=True when querying your Whisper model deployed as a serving endpoint in Databricks,
you must modify the serving endpoint’s inference logic to accept and process this parameter.

1. Update your model serving function to accept word_timestamps
Inside the model serving code you deployed (likely a Python function or MLflow model),
update the handler to read from the input payload:

def predict(model_input):
import whisper
import base64
import io

audio_data = base64.b64decode(model_input["audio"])
audio = whisper.load_audio(io.BytesIO(audio_data))

word_ts = model_input.get("word_timestamps", False)

model = whisper.load_model("small")
result = model.transcribe(audio=audio, word_timestamps=word_ts)

return result

 

Make sure the model is set up as a custom Python function model or MLflow pyfunc with
this logic in the predict() or model.predict() function.

 

2. Pass the parameter in your Databricks query:

response = workspace_client.serving_endpoints.query(
name="whisperv3",
inputs={
"audio": base64_audio_chunks[first_audio_key],
"word_timestamps": True
}
)
print(response.predictions[0])

Notes:
If you are using MLflow to deploy, your model must include this in the predict() method in your PythonModel wrapper.
If using a Databricks Model Serving endpoint via custom serving handler (e.g., Model Serving with a REST API),
your server must interpret the incoming JSON and apply word_timestamps=True to the call to Whisper.

 

 

 

 

LR

Hello @lingareddy_Alva

Thank you for your response. I have deployed the Whisper model using the Serving UI, do I need to deploy with MLFlow or serve the model with a REST API?

Hi @JoaoPigozzo 

Since you've already deployed the Whisper model using the Serving UI in Databricks,
whether you also need to use MLflow or a REST API depends on your use case.


What the Serving UI Gives You:
When you deploy a model (like Whisper) using Databricks Model Serving UI, Databricks:
1. Automatically creates a REST API endpoint
2. Handles scaling, versioning, and deployment
3. Allows token-based secure access
4. Provides a sample curl command, Python code, and OpenAPI spec

So you do not need to redeploy it with MLflow unless you need MLflow tracking or experiments.

 

 

 

LR

Hi @lingareddy_Alva. I appreciate your response.

I’ve looked for documentation but haven’t been able to find a solution. As you mentioned, I need to modify the serving endpoint’s inference logic to accept and handle this parameter. However, I don’t see any option to do that in the Serving UI, and I’m also unsure how to apply this update programmatically.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now