Databricks Community

Data_Cowboy · ‎03-16-2023

When loading an xgboost model from mlflow following the provided instructions in Databricks hosted MLflow the input sizes I am showing on the job are over 1 TB. Is anyone else using an xgboost.spark model and noticing the same behavior?

Below are some screenshots showing the input size. The job has been running over 15 minutes just to load the model from MLflow.

Data_Cowboy · ‎03-16-2023

Getting rid of the call to the full dbfs artifact path seemed to fix the issue for me.

View solution in original post

Data_Cowboy · ‎03-16-2023

Getting rid of the call to the full dbfs artifact path seemed to fix the issue for me.

dbx-user7354 · ‎04-26-2024

Thank you very much @Data_Cowboy !!! I had the same issue. I even had 14 TiB 😄

Databricks should really fix this

Data_Cowboy · ‎04-26-2024

@dbx-user7354 Glad to hear this solution worked out for you. Makes me feel good that I came back and answered my own post 😀

Databricks Community

Problems with xgboost.spark model loading from MLflow.

Connect with Databricks Users in Your Area

Databricks Learning Festival (Virtual): 15 January - 31 January 2025

Databricks Named a Leader in the 2024 Gartner® Magic Quadrant™ for Cloud Database Management Systems

Milestone: DatabricksTV Reaches 100 Videos!

Announcing the new Meta Llama 3.3 model on Databricks