Can't query Legacy Serving Endpoint

semsim — Fri, 13 Sep 2024 16:32:45 GMT

Hi,

I was able to deploy an endpoint using legacy serving (It's the only option we have to deploy endpoints in DB). Now I am having trouble querying the endpoint itself. When I try to query it I get the following error:

Here is the code I am using to query the endpoint:

mport os import requests import numpy as np import pandas as pd import json token = user_token def create_tf_serving_json(data): return {'inputs': {name: data[name].tolist() for name in data.keys()} if isinstance(data, dict) else data.tolist()} def score_model(dataset): url = 'url_to_model' headers = {'Authorization': f'Bearer {token}', 'Content-Type': 'application/json'} ds_dict = {"dataframe_split": dataset.to_dict(orient='split')} if isinstance(dataset, pd.DataFrame) else create_tf_serving_json(dataset) data_json = json.dumps(ds_dict, allow_nan=True) response = requests.request(method='POST', headers=headers, url=url, data=data_json) if response.status_code != 200: raise Exception(f'Request failed with status {response.status_code}, {response.text}') return response.json() # Scoring a model that accepts pandas DataFrames data = pd.DataFrame([{ "sepal_length": 5.1, "sepal_width": 3.5, "petal_length": 1.4, "petal_width": 0.2 }]) score_model(data) #MODEL_VERSION_URI, DATABRICKS_API_TOKEN, # Scoring a model that accepts tensors #data = np.asarray([[5.1, 3.5, 1.4, 0.2]]) #score_model(MODEL_VERSION_URI, DATABRICKS_API_TOKEN, data)

Re: Can't query Legacy Serving Endpoint

Louis_Frolio — Wed, 29 Oct 2025 10:35:57 GMT

Hey @semsim , sorry for the delayed response.

Thanks for the screenshot—this pinpoints the problem.

Root cause from the error

Your model’s predict path is trying to create or write to /Workspace/Shared, and the serving container does not permit that filesystem location. The stack trace ends with PermissionError: [Errno 1] Operation not permitted: '/Workspace/Shared'.

How to fix it in your model code

Serving predict functions must be side‑effect free (no writes to workspace paths). Update your model to avoid writing under /Workspace and use ephemeral or supported storage instead:

Remove any os.makedirs(...) or file writes to /Workspace/… inside predict(). Use tempfile and write under /tmp or /local_disk0/tmp for ephemeral files. Example:

import tempfile, os

def predict(self, context, model_input):
    tmpdir = tempfile.mkdtemp(dir="/local_disk0/tmp")  # or dir=None for /tmp
    out_path = os.path.join(tmpdir, "artifact.bin")
    with open(out_path, "wb") as f:
        f.write(b"...")  # if absolutely needed
    # ...perform inference without persisting to /Workspace
    return outputs

If you need to load static assets (tokenizers, feature maps, etc.), bundle them as MLflow model artifacts and read them relative to the model directory, not from /Workspace. Avoid writes during inference.
If persistence is actually required (for logs or results), write to external storage or databases the endpoint is authorized to access; do not write inside /Workspace from serving. Keep inference pure; log elsewhere asynchronously.

Client request adjustments (legacy serving)

Your client code is mostly fine—make these tweaks to avoid common request issues:

Use the legacy invocations URL format:
https://<workspace-host>/model/<registered-model-name>/<version-or-stage>/invocations.
Send the payload with requests.post(..., json=payload) rather than pre-dumping to data=.... Keep Content-Type: application/json and Authorization: Bearer <token>.
Match the scoring protocol to your MLflow version:
- For MLflow 2.x models, send a top-level "dataframe_split" for pandas, or "inputs" / "instances" for tensors.
- If the model was logged with MLflow 1.x, older formats like "dataframe_records" may be required; protocol mismatches can yield BAD_REQUEST. The Serving tab “Query endpoint” shows the expected format for your exact model.

Here’s a corrected minimal example:

import os
import requests
import numpy as np
import pandas as pd

token = os.getenv("DATABRICKS_API_TOKEN") or "dapi_..."
model_uri = "https://<workspace-host>/model/<registered-model-name>/<Production-or-version>/invocations"

headers = {"Authorization": f"Bearer {token}", "Content-Type": "application/json"}

def score_dataframe(df: pd.DataFrame):
    payload = {"dataframe_split": df.to_dict(orient="split")}
    resp = requests.post(model_uri, headers=headers, json=payload)
    if resp.status_code != 200:
        raise Exception(f"Request failed: {resp.status_code}, {resp.text}")
    return resp.json()

def score_tensor(arr: np.ndarray):
    payload = {"inputs": arr.tolist()}  # or {"instances": arr.tolist()}
    resp = requests.post(model_uri, headers=headers, json=payload)
    if resp.status_code != 200:
        raise Exception(f"Request failed: {resp.status_code}, {resp.text}")
    return resp.json()

Quick validation

Use the model’s Serving tab “Query endpoint” in the UI to copy the exact URL and sample request payload; this confirms both the path and protocol your model expects.