Databricks

MohsenJ · ‎04-09-2024

I setup a model serving endpoint and created a monitoring dashboard to monitor its performance. The problem is my inference table doesn't get updated by model serving endpoints.

To test the endpoint I use the following code

import random
import time

API_ROOT = dbutils.notebook.entry_point.getDbutils().notebook().getContext().apiUrl().get() 
API_TOKEN = dbutils.notebook.entry_point.getDbutils().notebook().getContext().apiToken().get()
endpoint_name = config["model_serving_endpoint_name"]
headers = {"Context-Type": "text/json", "Authorization": f"Bearer {API_TOKEN}"}

all_items = df_full_data.select(col("item_id")).distinct()
for user_id in range(100,150):
    print(user_id)
    items_not_rated_by_user = df_full_data.where(col("user_id")==user_id).select(col("item_id")).distinct()#collect()[0][0]
    no_rated_items =  [item.item_id for item in all_items.subtract(items_not_rated_by_user).limit(4).collect()]
    data = { "dataframe_records": [
        {"user_id":user_id, "item_id":no_rated_items[0], "rating": random.randint(1, 5)},
        {"user_id":user_id, "item_id":no_rated_items[1], "rating": random.randint(1, 5)},
        {"user_id":user_id, "item_id":no_rated_items[2], "rating": random.randint(1, 5)},
        {"user_id":user_id, "item_id":no_rated_items[2], "rating": random.randint(1, 5)},

        ]
            }
    response = requests.post(
     url=f"{API_ROOT}/serving-endpoints/{endpoint_name}/invocations", json=data, headers=headers
 )
    print(response.json())
    time.sleep(random.randint(60*1, 60*3))

The code worked fine and I could see my inference table filled in with new rows. I setup a monitoring pipeline in which it reads and unpack the result of this table to a new table with the right format for monitoring. For this I use the inference table as a streaming table. I adapted the code for this from the example here

Now I want to check if my unpacked table can automatically react to changes to my inference table whenever the endpoint receive a new request. For that I used the code above to call the endpoint with some new data. but I can't see any update in my inference table although I see from endpoint UI that it received and processed the requests.

here is how I created the endpoint

data = {
    "name": endpoint_name,
    "config": {
        "served_models": [
            {
                "model_name": model_name,
                "model_version": int(model_version),
                "workload_size": workload_size,
                "scale_to_zero_enabled": scale_to_zero,
                "workload_type": workload_type,
            }
        ],
        "auto_capture_config":{
            "catalog_name": catalog_name,
            "schema_name": model_schema,

    }
    },
}

headers = {"Context-Type": "text/json", "Authorization": f"Bearer {API_TOKEN}"}
 
response = requests.post(
    url=f"{API_ROOT}/api/2.0/serving-endpoints", json=data, headers=headers
)

and here is the service log of the serving model

[5954fs284d] [2024-04-09 12:19:21 +0000] [7] [INFO] Starting gunicorn 21.2.0
[5954fs284d] [2024-04-09 12:19:21 +0000] [7] [INFO] Listening at: http://0.0.0.0:8080 (7)
[5954fs284d] [2024-04-09 12:19:21 +0000] [7] [INFO] Using worker: sync
[5954fs284d] [2024-04-09 12:19:21 +0000] [8] [INFO] Booting worker with pid: 8
[5954fs284d] [2024-04-09 12:19:21 +0000] [9] [INFO] Booting worker with pid: 9
[5954fs284d] [2024-04-09 12:19:21 +0000] [10] [INFO] Booting worker with pid: 10
[5954fs284d] [2024-04-09 12:19:21 +0000] [11] [INFO] Booting worker with pid: 11
[5954fs284d] ERROR StatusLogger Reconfiguration failed: No configuration found for '5ffd2b27' at 'null' in 'null'
[5954fs284d] ERROR StatusLogger Reconfiguration failed: No configuration found for 'Default' at 'null' in 'null'
[5954fs284d] ERROR StatusLogger Reconfiguration failed: No configuration found for '5ffd2b27' at 'null' in 'null'
[5954fs284d] ERROR StatusLogger Reconfiguration failed: No configuration found for 'Default' at 'null' in 'null'
[5954fs284d] ERROR StatusLogger Reconfiguration failed: No configuration found for '5ffd2b27' at 'null' in 'null'
[5954fs284d] ERROR StatusLogger Reconfiguration failed: No configuration found for 'Default' at 'null' in 'null'
[5954fs284d] ERROR StatusLogger Reconfiguration failed: No configuration found for '5ffd2b27' at 'null' in 'null'
[5954fs284d] ERROR StatusLogger Reconfiguration failed: No configuration found for 'Default' at 'null' in 'null'
[5954fs284d] Setting default log level to "WARN".
[5954fs284d] To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
[5954fs284d] Setting default log level to "WARN".
[5954fs284d] To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
[5954fs284d] Setting default log level to "WARN".
[5954fs284d] To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
[5954fs284d] Setting default log level to "WARN".
[5954fs284d] To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
[5954fs284d] 24/04/09 12:19:27 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[5954fs284d] 24/04/09 12:19:27 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[5954fs284d] 24/04/09 12:19:28 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[5954fs284d] 24/04/09 12:19:28 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[5954fs284d] 24/04/09 12:19:30 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
[5954fs284d] 24/04/09 12:19:30 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
[5954fs284d] 24/04/09 12:19:30 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
[5954fs284d] 24/04/09 12:19:30 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
[5954fs284d] 24/04/09 12:19:30 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
[5954fs284d] 24/04/09 12:19:30 WARN Utils: Service 'SparkUI' could not bind on port 4042. Attempting port 4043.
[5954fs284d] 2024/04/09 12:19:33 INFO mlflow.spark: File '/model/sparkml' is already on DFS, copy is not necessary.
[5954fs284d] 2024/04/09 12:19:33 INFO mlflow.spark: File '/model/sparkml' is already on DFS, copy is not necessary.
[5954fs284d] 2024/04/09 12:19:33 INFO mlflow.spark: File '/model/sparkml' is already on DFS, copy is not necessary.
[5954fs284d] 2024/04/09 12:19:33 INFO mlflow.spark: File '/model/sparkml' is already on DFS, copy is not necessary.

attached you see the history of the inference table.

any idea what could be the problem or how can I debugg the issue further?

Kaniz · ‎04-09-2024

Hi @MohsenJ,

The log shows several reconfiguration errors related to the logger configuration. These errors are likely due to missing or incorrect configuration settings. Here are some steps to troubleshoot:

Check Log Configuration: Verify that the logger configuration is correctly set up. Ensure that the necessary log properties (such as log level, log format, and log file location) are defined. You can typically configure logging in your application’s configuration files or directly in your code.
Logger Initialization: Make sure the logger is properly initialized when your application starts. If you’re using a specific logging library (such as Log4j or Python’s logging module), ensure that the logger is correctly instantiated.
Check Dependencies: Confirm that any required dependencies for logging (such as log4j libraries) are available and correctly configured.
The log also mentions setting the default log level to “WARN” for Spark. This is a common practice to reduce the verbosity of Spark logs. However, it’s essential to adjust the log level based on your debugging needs. You can change the log level using the following code:

spark.sparkContext.setLogLevel("INFO")  # Set the desired log level (e.g., INFO, DEBUG, ERROR)

If you’re using Spark streaming queries, ensure that you’ve specified a valid checkpoint location. The checkpoint location is crucial for maintaining query state and ensuring fault tolerance. Make sure the path exists and has appropriate permissions
Inspect other parts of your application (such as the model serving code) for any additional errors or warnings.
Check if there are any issues related to the model inference process itself (e.g., model loading, prediction, or response handling).

Remember to adapt these steps to your specific environment and codebase. If you encounter any specific error messages or need further assistance, feel free to share more details, and I’ll be happy to assist! 🚀

Databricks

the inference table table doesn't get updated

Databricks Community Social - May 2024

🔔 Attention Databricks Academy Users: SSO Implementation Incoming! Secure Your Account Today!

Announcing the General Availability of Databricks Asset Bundles

How to successfully build GenAI applications

Meet DBRX, the New Standard for High-Quality LLMs