cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
cancel
Showing results for 
Search instead for 
Did you mean: 

MLflow stopped working after one a few successful runs

zhh210
New Contributor III

Both the mlflow.log_metrics() call and the web UI worked in the first few days but started failing at some point. The log doesn't give any clue why this is happening. It's suspicious but is there a limit of mlflow requests? It's quite annoying that the web UI will stop displaying the finished mlflow runs with no message showing why this is happening.

  File "/home/ec2-user/anaconda3/envs/abr/lib/python3.8/site-packages/mlflow/tracking/fluent.py", line 1050, in get_experiment_by_name
    return MlflowClient().get_experiment_by_name(name)
  File "/home/ec2-user/anaconda3/envs/abr/lib/python3.8/site-packages/mlflow/tracking/client.py", line 451, in get_experiment_by_name
    return self._tracking_client.get_experiment_by_name(name)
  File "/home/ec2-user/anaconda3/envs/abr/lib/python3.8/site-packages/mlflow/tracking/_tracking_service/client.py", line 197, in get_experiment_by_name
    return self.store.get_experiment_by_name(name)
  File "/home/ec2-user/anaconda3/envs/abr/lib/python3.8/site-packages/mlflow/store/tracking/rest_store.py", line 290, in get_experiment_by_name
    response_proto = self._call_endpoint(GetExperimentByName, req_body)
  File "/home/ec2-user/anaconda3/envs/abr/lib/python3.8/site-packages/mlflow/store/tracking/rest_store.py", line 56, in _call_endpoint
    return call_endpoint(self.get_host_creds(), endpoint, method, json_body, response_proto)
  File "/home/ec2-user/anaconda3/envs/abr/lib/python3.8/site-packages/mlflow/utils/rest_utils.py", line 273, in call_endpoint
    response = http_request(
  File "/home/ec2-user/anaconda3/envs/abr/lib/python3.8/site-packages/mlflow/utils/rest_utils.py", line 184, in http_request
    raise MlflowException("API request to %s failed with exception %s" % (url, e))
mlflow.exceptions.MlflowException: API request to https://community.cloud.databricks.com/api/2.0/mlflow/experiments/get-by-name failed with exception HTTPSConnectionPool(host='community.cloud.databricks.com', port=443): Max retries exceeded with url: /api/2.0/mlflow/experiments/get-by-name?experiment_name=%2Fabr-sagemaker-mid (Caused by ResponseError('too many 429 error responses'))

image

2 REPLIES 2

Anonymous
Not applicable

The MLflow slack may be a good place to ask:

zhh210
New Contributor III

It seems Databricks disabled showing existing runs. There's a small pop-up window flashing for half a second saying "This endpoint has temporarily been disabled. Please contact Databricks support". Almost missed it as the window shows up in random manner and is ephemeral.

image

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.