I setup a model serving endpoint and created a monitoring dashboard to monitor its performance. The problem is my inference table doesn't get updated by model serving endpoints.
To test the endpoint I use the following code
import random
import time
API_ROOT = dbutils.notebook.entry_point.getDbutils().notebook().getContext().apiUrl().get()
API_TOKEN = dbutils.notebook.entry_point.getDbutils().notebook().getContext().apiToken().get()
endpoint_name = config["model_serving_endpoint_name"]
headers = {"Context-Type": "text/json", "Authorization": f"Bearer {API_TOKEN}"}
all_items = df_full_data.select(col("item_id")).distinct()
for user_id in range(100,150):
print(user_id)
items_not_rated_by_user = df_full_data.where(col("user_id")==user_id).select(col("item_id")).distinct()#collect()[0][0]
no_rated_items = [item.item_id for item in all_items.subtract(items_not_rated_by_user).limit(4).collect()]
data = { "dataframe_records": [
{"user_id":user_id, "item_id":no_rated_items[0], "rating": random.randint(1, 5)},
{"user_id":user_id, "item_id":no_rated_items[1], "rating": random.randint(1, 5)},
{"user_id":user_id, "item_id":no_rated_items[2], "rating": random.randint(1, 5)},
{"user_id":user_id, "item_id":no_rated_items[2], "rating": random.randint(1, 5)},
]
}
response = requests.post(
url=f"{API_ROOT}/serving-endpoints/{endpoint_name}/invocations", json=data, headers=headers
)
print(response.json())
time.sleep(random.randint(60*1, 60*3))
The code worked fine and I could see my inference table filled in with new rows. I setup a monitoring pipeline in which it reads and unpack the result of this table to a new table with the right format for monitoring. For this I use the inference table as a streaming table. I adapted the code for this from the example here
Now I want to check if my unpacked table can automatically react to changes to my inference table whenever the endpoint receive a new request. For that I used the code above to call the endpoint with some new data. but I can't see any update in my inference table although I see from endpoint UI that it received and processed the requests.
here is how I created the endpoint
data = {
"name": endpoint_name,
"config": {
"served_models": [
{
"model_name": model_name,
"model_version": int(model_version),
"workload_size": workload_size,
"scale_to_zero_enabled": scale_to_zero,
"workload_type": workload_type,
}
],
"auto_capture_config":{
"catalog_name": catalog_name,
"schema_name": model_schema,
}
},
}
headers = {"Context-Type": "text/json", "Authorization": f"Bearer {API_TOKEN}"}
response = requests.post(
url=f"{API_ROOT}/api/2.0/serving-endpoints", json=data, headers=headers
)
and here is the service log of the serving model
[5954fs284d] [2024-04-09 12:19:21 +0000] [7] [INFO] Starting gunicorn 21.2.0
[5954fs284d] [2024-04-09 12:19:21 +0000] [7] [INFO] Listening at: http://0.0.0.0:8080 (7)
[5954fs284d] [2024-04-09 12:19:21 +0000] [7] [INFO] Using worker: sync
[5954fs284d] [2024-04-09 12:19:21 +0000] [8] [INFO] Booting worker with pid: 8
[5954fs284d] [2024-04-09 12:19:21 +0000] [9] [INFO] Booting worker with pid: 9
[5954fs284d] [2024-04-09 12:19:21 +0000] [10] [INFO] Booting worker with pid: 10
[5954fs284d] [2024-04-09 12:19:21 +0000] [11] [INFO] Booting worker with pid: 11
[5954fs284d] ERROR StatusLogger Reconfiguration failed: No configuration found for '5ffd2b27' at 'null' in 'null'
[5954fs284d] ERROR StatusLogger Reconfiguration failed: No configuration found for 'Default' at 'null' in 'null'
[5954fs284d] ERROR StatusLogger Reconfiguration failed: No configuration found for '5ffd2b27' at 'null' in 'null'
[5954fs284d] ERROR StatusLogger Reconfiguration failed: No configuration found for 'Default' at 'null' in 'null'
[5954fs284d] ERROR StatusLogger Reconfiguration failed: No configuration found for '5ffd2b27' at 'null' in 'null'
[5954fs284d] ERROR StatusLogger Reconfiguration failed: No configuration found for 'Default' at 'null' in 'null'
[5954fs284d] ERROR StatusLogger Reconfiguration failed: No configuration found for '5ffd2b27' at 'null' in 'null'
[5954fs284d] ERROR StatusLogger Reconfiguration failed: No configuration found for 'Default' at 'null' in 'null'
[5954fs284d] Setting default log level to "WARN".
[5954fs284d] To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
[5954fs284d] Setting default log level to "WARN".
[5954fs284d] To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
[5954fs284d] Setting default log level to "WARN".
[5954fs284d] To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
[5954fs284d] Setting default log level to "WARN".
[5954fs284d] To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
[5954fs284d] 24/04/09 12:19:27 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[5954fs284d] 24/04/09 12:19:27 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[5954fs284d] 24/04/09 12:19:28 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[5954fs284d] 24/04/09 12:19:28 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[5954fs284d] 24/04/09 12:19:30 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
[5954fs284d] 24/04/09 12:19:30 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
[5954fs284d] 24/04/09 12:19:30 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
[5954fs284d] 24/04/09 12:19:30 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
[5954fs284d] 24/04/09 12:19:30 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
[5954fs284d] 24/04/09 12:19:30 WARN Utils: Service 'SparkUI' could not bind on port 4042. Attempting port 4043.
[5954fs284d] 2024/04/09 12:19:33 INFO mlflow.spark: File '/model/sparkml' is already on DFS, copy is not necessary.
[5954fs284d] 2024/04/09 12:19:33 INFO mlflow.spark: File '/model/sparkml' is already on DFS, copy is not necessary.
[5954fs284d] 2024/04/09 12:19:33 INFO mlflow.spark: File '/model/sparkml' is already on DFS, copy is not necessary.
[5954fs284d] 2024/04/09 12:19:33 INFO mlflow.spark: File '/model/sparkml' is already on DFS, copy is not necessary.
attached you see the history of the inference table.
any idea what could be the problem or how can I debugg the issue further?