Databricks Community

John22 · ‎12-13-2023

Hi all! I am trying to create an endpoint for Easy OCR. I was able to create the experiment using a wrapper class with the code below:

# import libraries

import mlflow
import mlflow.pyfunc
import cloudpickle
import cv2
import re
import easyocr
import base64

import os
import requests
import numpy as np
import pandas as pd
import json
from PIL import Image

# load reader
class EasyOCRWrapper(mlflow.pyfunc.PythonModel):
    def load_context(self, context):
        # Load your EasyOCR model during the deployment context
        self.reader = easyocr.Reader(['en'])
        
    def predict(self, context, model_input):
        # Make predictions using the loaded EasyOCR model
        return self.reader.readtext(model_input)

# set experiment name
experiment_name = "easyocr_test"

# get images with text
img = "/dbfs/FileStore/path_to_png_image1"
img2 = "/dbfs/FileStore/path_to_png_image2"

# set image examples
example_image_path = img
example_image_description = "Example image for OCR."

# image preprocessing function
def preprocessing(image_path):
        """
        Preprocesses an image in preparation for OCR detection.

        Args:
        image_path = path to image
        
        Returns:
        clean_image = preprocessed image
        """
        raw_image = cv2.imread(image_path)
        clean_image = cv2.cvtColor(raw_image, cv2.COLOR_BGR2GRAY)
        return clean_image

# create input example
input_example = preprocessing(img)

# start mlflow run
with mlflow.start_run(run_name="easyocr_run_v2"):

    mlflow.log_artifact(example_image_path, artifact_path="input_images")

    # Log the input image description as a parameter
    mlflow.log_param("input_image_description", example_image_description)

    # Log parameters
    mlflow.log_param("model", "EasyOCR")

    # Log the EasyOCR model wrapper as an artifact
    mlflow.pyfunc.log_model("easyocr_model", python_model=EasyOCRWrapper(), input_example=input_example)

mlflow_ui_url = mlflow.get_tracking_uri()
print(f"MLflow UI: {mlflow_ui_url}")

From here I was able to:

Tested the loaded model and was able to get an output
Registered the model
Created the endpoint with the model

Oh note, the experiment for this run did not create an input schema with this input example. When trying to pass the image to query the model endpoint, I got the following error using the following code:

# intitate functions to query endpoint
def create_tf_serving_json2(data):
  return {'inputs': {name: data[name].tolist() for name in data.keys()} if isinstance(data, dict) else data.tolist()}

def score_model2(dataset):
  url = 'https://dbc-6fc85d31-c88e.cloud.databricks.com/serving-endpoints/eastocr_test_register_v12/invocations'
  headers = {'Authorization': f'Bearer {dbutils.secrets.get("SecretBucket", "DatabricksToken")}', 'Content-Type': 'application/json'}
  ds_dict = {'dataframe_split': dataset.to_dict(orient='split')} if isinstance(dataset, pd.DataFrame) else create_tf_serving_json(dataset)
  data_json = json.dumps(ds_dict, allow_nan=True)
  response = requests.request(method='POST', headers=headers, url=url, data=data_json)
  if response.status_code != 200:
    raise Exception(f'Request failed with status {response.status_code}, {response.text}')
  return response.json()

# test with image
scored_model(input_example)

Exception: Request failed with status 400, {"error_code": "BAD_REQUEST", "message": "Encountered an unexpected error while evaluating the model. Verify that the input is compatible with the model for inference. Error 'OpenCV(4.8.1) /io/opencv/modules/imgproc/src/color.simd_helpers.hpp:94: error: (-2:Unspecified error) in function 'cv::impl::{anonymous}::CvtHelper<VScn, VDcn, VDepth, sizePolicy>::CvtHelper(cv::InputArray, cv::OutputArray, int) [with VScn = cv::impl::{anonymous}::Set<1>; VDcn = cv::impl::{anonymous}::Set<3, 4>; VDepth = cv::impl::{anonymous}::Set<0, 2, 5>; cv::impl::{anonymous}::SizePolicy sizePolicy = cv::impl::<unnamed>::NONE; cv::InputArray = const cv::_InputArray&; cv::OutputArray = const cv::_OutputArray&]'\n> Unsupported depth of input image:\n>     'VDepth::contains(depth)'\n> where\n>     'depth' is 4 (CV_32S)\n'", "stack_trace": "Traceback (most recent call last):\n  File \"/opt/conda/envs/mlflow-env/lib/python3.9/site-packages/src/mlflowserving/scoring_server/__init__.py\", line 457, in transformation\n    raw_predictions = model.predict(data, params=params)\n  File \"/opt/conda/envs/mlflow-env/lib/python3.9/site-packages/mlflow/pyfunc/__init__.py\", line 491, in predict\n    return _predict()\n  File \"/opt/conda/envs/mlflow-env/lib/python3.9/site-packages/mlflow/pyfunc/__init__.py\", line 477, in _predict\n    return self._predict_fn(data, params=params)\n  File \"/opt/conda/envs/mlflow-env/lib/python3.9/site-packages/mlflow/pyfunc/model.py\", line 473, in predict\n    return self.python_model.predict(self.context, self._convert_input(model_input))\n  File \"<command-86792240677018>\", line 9, in predict\n  File \"/opt/conda/envs/mlflow-env/lib/python3.9/site-packages/easyocr/easyocr.py\", line 454, in readtext\n    img, img_cv_grey = reformat_input(image)\n  File \"/opt/conda/envs/mlflow-env/lib/python3.9/site-packages/easyocr/utils.py\", line 751, in reformat_input\n    img = cv2.cvtColor(image, cv2.COLOR_GRAY2BGR)\ncv2.error: OpenCV(4.8.1) /io/opencv/modules/imgproc/src/color.simd_helpers.hpp:94: error: (-2:Unspecified error) in function 'cv::impl::{anonymous}::CvtHelper<VScn, VDcn, VDepth, sizePolicy>::CvtHelper(cv::InputArray, cv::OutputArray, int) [with VScn = cv::impl::{anonymous}::Set<1>; VDcn = cv::impl::{anonymous}::Set<3, 4>; VDepth = cv::impl::{anonymous}::Set<0, 2, 5>; cv::impl::{anonymous}::SizePolicy sizePolicy = cv::impl::<unnamed>::NONE; cv::InputArray = const cv::_InputArray&; cv::OutputArray = const cv::_OutputArray&]'\n> Unsupported depth of input image:\n>     'VDepth::contains(depth)'\n> where\n>     'depth' is 4 (CV_32S)\n\n"}

From here I have tried:

1. Changing the data type of the inputs and for array inputs reshaped the dimensions

2. Changing the wrapper class to include imports

3. Added the libraries Easy OCR needs directory as I do the experiment run.

Is it possible I am not creating the wrapper function correctly? Please advise on how I can get this endpoint to accept these preprocessed or raw image inputs. Thank you in advance.