Hey AmineM!
If your MLflow model loads fine in a Databricks notebook but fails in a scheduled job on serverless compute with an error like:
TypeError: code() argument 13 must be str, not int
the root cause is almost always a mismatch between the Python version (or dependencies like cloudpickle
) used when the model was logged and the version used by your job cluster. This is especially common if you train your model on one Databricks Runtime (say, Python 3.8) and run your scheduled job on another (like Python 3.11), or across different serverless vs. interactive environments.
How to fix it:
- Make sure your scheduled job runs on a compute/runtime with the same Python and package versions as where you trained/logged the model.
- If you can't control the job compute's environment (sometimes the case on serverless), re-log the model from a job running on that same compute type and use this new artifact for predictions.
- Optionally, check your modelโs
requirements.txt
/conda.yaml
for dependency mismatches (especially cloudpickle
).
- Using Model Serving works because it auto-aligns dependencies, but it does cost more. Best practice for batch is to avoid it if possible.
This is a known serialization problemโmatching environments is the robust solution! I hope this is helpful.
Best,
Sarah