Databricks

Charley · ‎12-16-2022

Hi all, I've deployed a model, moved it to production and served it (mlflow), but when testing it in the python notebook I get a 400 error. code/details below:

import os

import requests

import json

import pandas as pd

import numpy as np

# Create two records for testing the prediction

test_input1 = {"OriginAirportCode":"SAT","Month":5,"DayofMonth":5,"CRSDepHour":13,"DayOfWeek":7,"Carrier":"MQ","DestAirportCode":"ORD","WindSpeed":9,"SeaLevelPressure":30.03,"HourlyPrecip":0}

test_input2 = {"OriginAirportCode":"ATL","Month":2,"DayofMonth":5,"CRSDepHour":8,"DayOfWeek":4,"Carrier":"MQ","DestAirportCode":"MCO","WindSpeed":3,"SeaLevelPressure":31.03,"HourlyPrecip":0}

# package the inputs into a JSON string and test run() in local notebook

inputs = pd.DataFrame([test_input1, test_input2])

print(inputs)

def create_tf_serving_json(data):

return {'inputs': {name: data[name].tolist() for name in data.keys()} if isinstance(data, dict) else data.tolist()}

def score_model(dataset):

url = 'https://adb-<obfuscated>.azuredatabricks.net/model/Delay%20Estimator/Production/invocations' # Enter your URL here

personal_access_token = 'dapi2<obfuscated>853-2' # Enter your Personal Access Token here

headers = {'Authorization': f'Bearer {personal_access_token}'}

data_json = dataset.to_dict(orient='split') if isinstance(dataset, pd.DataFrame) else create_tf_serving_json(dataset)

response = requests.request(method='POST', headers=headers, url=url, json=data_json)

if response.status_code != 200:

raise Exception(f'Request failed with status {response.status_code}, {response.text}')

return response.json()

score_model(inputs)

CMD ERROR OUTPUT:

Exception: Request failed with status 400, {"error_code": "BAD_REQUEST", "message": "The input must be a JSON dictionary with exactly one of the input fields {'dataframe_split', 'instances', 'inputs', 'dataframe_records'}. Received dictionary with input fields: ['index', 'columns', 'data']. IMPORTANT: The MLflow Model scoring protocol has changed in MLflow version 2.0. If you are seeing this error, you are likely using an outdated scoring request format. To resolve the error, either update your request format or adjust your MLflow Model's requirements file to specify an older version of MLflow (for example, change the 'mlflow' requirement specifier to 'mlflow==1.30.0'). If you are making a request using the MLflow client (e.g. via `mlflow.pyfunc.spark_udf()`), upgrade your MLflow client to a version >= 2.0 in order to use the new request format. For more information about the updated MLflow Model scoring protocol in MLflow 2.0, see https://mlflow.org/docs/latest/models.html#deploy-mlflow-models."}

I've tested with earlier library versions with no joy, and also tried different ways to code it. Does anyone have any idea what the issue is or a better way to call it?

I'll be calling the same serving endpoint from a web app once this is working.

nakany · ‎02-07-2023

data_json in the score_model function should be defined as follows

ds_dict = {"dataframe_split": dataset.to_dict(orient='split')} if isinstance(dataset, pd.DataFrame) else create_tf_serving_json(dataset)

Databricks

error status 400 calling serving model endpoint invocation using personal access token on Azure Databricks

Registration now open! Databricks Data + AI Summit 2024

Meet DBRX, the New Standard for High-Quality LLMs

Data Warehousing in the Era of AI