06-25-2025 02:31 AM
Hi, I'm currently working on a automated job to predict forecasts using a notebook than work just fine when I run it manually, but keep failling when schedueled, here is my code:
import mlflow
# Load model as a PyFuncModel.
loaded_model = mlflow.pyfunc.load_model(logged_model)
# Predict using the model
results_df = loaded_model.predict(predict_df)
# Define group_column and time_column
group_column = "id" # Replace with your actual group column name
time_column = "week_date_format" # Replace with your actual time column name
target_column = "sales_value"
# Display the prediction results with timestamp for each id
final_df = results_df.reset_index()[[group_column, time_column, "yhat"]].tail(
forecast_horizon * predict_df[group_column].nunique()
)
final_df = final_df.rename(columns={'yhat': target_column})
display(final_df)The other cells where mflow is installed and model dependecies are working fyi.
PS: I use serverless job compute.
10-08-2025 06:12 AM
Hey AmineM!
TypeError: code() argument 13 must be str, not int
cloudpickle) used when the model was logged and the version used by your job cluster. This is especially common if you train your model on one Databricks Runtime (say, Python 3.8) and run your scheduled job on another (like Python 3.11), or across different serverless vs. interactive environments.How to fix it:
requirements.txt/conda.yaml for dependency mismatches (especially cloudpickle).06-25-2025 02:33 AM
and here is the error that I get:
TypeError: code() argument 13 must be str, not int
File <command-32974490616971>, line 16
13 logged_model = 'runs:/f715739d09624676b443cb02e7c98cc0/model'
15 # Load model as a PyFuncModel.
---> 16 loaded_model = mlflow.pyfunc.load_model(logged_model)
17 # Predict using the model
18 results_df = loaded_model.predict(predict_df)
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-222c73bc-a540-4c26-aa9b-af028baf9eca/lib/python3.11/site-packages/mlflow/pyfunc/model.py:659, in _load_context_model_and_signature(model_path, model_config)
657 raise MlflowException("Python model path was not specified in the model configuration")
658 with open(os.path.join(model_path, python_model_subpath), "rb") as f:
--> 659 python_model = cloudpickle.load(f)
661 artifacts = {}
662 for saved_artifact_name, saved_artifact_info in pyfunc_config.get(
663 CONFIG_KEY_ARTIFACTS, {}
664 ).items():
06-25-2025 03:12 AM
found a momentary solution : use a serving endpoint but it increase costs
10-08-2025 06:12 AM
Hey AmineM!
TypeError: code() argument 13 must be str, not int
cloudpickle) used when the model was logged and the version used by your job cluster. This is especially common if you train your model on one Databricks Runtime (say, Python 3.8) and run your scheduled job on another (like Python 3.11), or across different serverless vs. interactive environments.How to fix it:
requirements.txt/conda.yaml for dependency mismatches (especially cloudpickle).Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now