- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-13-2021 03:58 PM
I am facing an issue in loading a ML artifact for a specific run by search the experiment runs to get a specific run_id as follows:
https://www.mlflow.org/docs/latest/rest-api.html#search-runs
API request to https://eastus-c3.azuredatabricks.net/api/2.0/mlflow/runs/search failed with exception HTTPSConnectionPool(host='eastus-c3.azuredatabricks.net', port=443): Max retries exceeded with url: /api/2.0/mlflow/runs/search (Caused by ResponseError('too many 429 error responses'))
# Search the experiment_id using a filter_string with customer and product_key and order by start time
query = f"params.product_key = {product_key} and params.customer = '{customer}'"
runs_df = mlflow.search_runs([experiment.experiment_id], filter_string=query, order_by=["start_time DESC"])
# Get the latest run id recorded
run_id = runs_df.run_id.values[0]
artifact_uri = runs_df.artifact_uri.values[0]
client = MlflowClient()
429 is an HTTP response status code that indicates the client application has surpassed its rate limit, or number of requests they can send in a given period of time. Is there any fix for that ?
I am running the search_runs() API in a pandas_udf function that does a search at the customer and product_key level in my dataframe to find the proper logged model and artifact to load for inference.
As the inference process is pretty quick and number of product_keys are in the range of 4000 records, I end up hitting the MLFlow search API around 30-40 times per minute.
Any thought on this?
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-16-2021 06:30 PM
Yes, you will hit rate limits if you try to query the API so fast in parallel. Do you just want to manipulate the run data in an experiment with Spark? you can simply load all that data in a DataFrame with spark.read.format("mlflow-experiment").load("... your experiment path ..."). With all the data you can sort, query, etc (or convert to a pandas DF if you want)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-16-2021 06:30 PM
Yes, you will hit rate limits if you try to query the API so fast in parallel. Do you just want to manipulate the run data in an experiment with Spark? you can simply load all that data in a DataFrame with spark.read.format("mlflow-experiment").load("... your experiment path ..."). With all the data you can sort, query, etc (or convert to a pandas DF if you want)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-20-2021 03:30 PM
Thanks Sean, that's exactly what I need without hitting the API. Loading the experiment runs once and manipulate and filter them as needed.
![](/skins/images/97567C72181EBE789E1F0FD869E4C89B/responsive_peak/images/icon_anonymous_message.png)
![](/skins/images/97567C72181EBE789E1F0FD869E4C89B/responsive_peak/images/icon_anonymous_message.png)