Databricks Community

Joseph_B · ‎01-08-2022

I'm fitting multiple models in parallel. For each one, I'm logging lots of params and metrics to MLflow. I'm hitting rate limits, causing problems in my jobs.

Joseph_B · ‎01-08-2022

The first thing to try is to log in batches. If you are logging each param and metric separately, you're making 1 API call per param and 1 per metric. Instead, you should use the batch logging APIs; e.g. use "log_params" instead of "log_param" https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.log_params

If you're logging 10 params and metrics per model, this will cut the number of API calls you're making by a factor of 10.

If this optimization is still insufficient for you, then I'd recommend doing 2 things:

1) Short-term workaround: You can save data you wish to log to a table and log it in a follow-up process later.

2) Medium/long-term: I'd recommend working with your Databricks account team to come up with a solution to match your needs. In general, the best option is to reorganize how models are being fit or logged to do more efficient batching in logging. But different cases may have different best solutions, so working with your account team may be the best option.

Databricks Community

What can I do to reduce the number of MLflow API calls I make?

Photos

Join Us as a Local Community Builder!

Exciting Opportunity to Collaborate with Us!

Intelligent Data Warehousing: AI/BI for Self-service Analytics

Share Your Thoughts on Databricks & Get Rewarded!

Get Started With Lakehouse Architecture | Pass a quiz to earn your certificate completion.

Virtual Learning Festival: 9 April - 30 April