cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
cancel
Showing results for 
Search instead for 
Did you mean: 

error status 400 calling serving model endpoint invocation using personal access token on Azure Databricks

Charley
New Contributor II

Hi all, I've deployed a model, moved it to production and served it (mlflow), but when testing it in the python notebook I get a 400 error. code/details below:

import os

import requests

import json

import pandas as pd

import numpy as np

# Create two records for testing the prediction

test_input1 = {"OriginAirportCode":"SAT","Month":5,"DayofMonth":5,"CRSDepHour":13,"DayOfWeek":7,"Carrier":"MQ","DestAirportCode":"ORD","WindSpeed":9,"SeaLevelPressure":30.03,"HourlyPrecip":0}

test_input2 = {"OriginAirportCode":"ATL","Month":2,"DayofMonth":5,"CRSDepHour":8,"DayOfWeek":4,"Carrier":"MQ","DestAirportCode":"MCO","WindSpeed":3,"SeaLevelPressure":31.03,"HourlyPrecip":0}

# package the inputs into a JSON string and test run() in local notebook

inputs = pd.DataFrame([test_input1, test_input2])

print(inputs)

def create_tf_serving_json(data):

 return {'inputs': {name: data[name].tolist() for name in data.keys()} if isinstance(data, dict) else data.tolist()}

def score_model(dataset):

 url = 'https://adb-<obfuscated>.azuredatabricks.net/model/Delay%20Estimator/Production/invocations' # Enter your URL here

 personal_access_token = 'dapi2<obfuscated>853-2' # Enter your Personal Access Token here

 headers = {'Authorization': f'Bearer {personal_access_token}'}

 data_json = dataset.to_dict(orient='split') if isinstance(dataset, pd.DataFrame) else create_tf_serving_json(dataset)

 response = requests.request(method='POST', headers=headers, url=url, json=data_json)

 if response.status_code != 200:

  raise Exception(f'Request failed with status {response.status_code}, {response.text}')

 return response.json()

score_model(inputs)

CMD ERROR OUTPUT:

Exception: Request failed with status 400, {"error_code": "BAD_REQUEST", "message": "The input must be a JSON dictionary with exactly one of the input fields {'dataframe_split', 'instances', 'inputs', 'dataframe_records'}. Received dictionary with input fields: ['index', 'columns', 'data']. IMPORTANT: The MLflow Model scoring protocol has changed in MLflow version 2.0. If you are seeing this error, you are likely using an outdated scoring request format. To resolve the error, either update your request format or adjust your MLflow Model's requirements file to specify an older version of MLflow (for example, change the 'mlflow' requirement specifier to 'mlflow==1.30.0'). If you are making a request using the MLflow client (e.g. via `mlflow.pyfunc.spark_udf()`), upgrade your MLflow client to a version >= 2.0 in order to use the new request format. For more information about the updated MLflow Model scoring protocol in MLflow 2.0, see https://mlflow.org/docs/latest/models.html#deploy-mlflow-models."}

I've tested with earlier library versions with no joy, and also tried different ways to code it. Does anyone have any idea what the issue is or a better way to call it?

I'll be calling the same serving endpoint from a web app once this is working.

1 REPLY 1

nakany
New Contributor II

data_json in the score_model function should be defined as follows

ds_dict = {"dataframe_split": dataset.to_dict(orient='split')} if isinstance(dataset, pd.DataFrame) else create_tf_serving_json(dataset)

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.