Great observation! The difference between Using worker: sync and Using worker: gevent typically refers to the worker class used by Gunicorn, the web server behind many MLflow model deployments (like in Databricks model serving or other MLflow-compatible environments).
The error:
[ERROR] Worker (pid:11) was sent code 132
...often indicates a crash during model loading or execution, and differences in worker type (sync vs gevent) can affect how threads and concurrency are handled — which matters a lot when you're using libraries like ANNOY, which might rely on file descriptors or multithreading.
- Solution: Force the model serving to use the gevent worker
MLflow doesn't let you directly set the Gunicorn worker type via the Python API (e.g., mlflow.models.serve) or Databricks model serving configuration.
However, if you're serving locally or managing the model server yourself (e.g., using MLflow + Docker), you can manually specify the worker type using Gunicorn flags.
Example (Manual MLflow Serve):
gunicorn -b 0.0.0.0:5000 -w 4 --worker-class gevent mlflow.pyfunc.scoring_server.wsgi:app
This way you can serve your MLflow model with gevent workers explicitly.
Wiliam Rosa
Data Engineer | Machine Learning Engineer
LinkedIn: linkedin.com/in/wiliamrosa