cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

MLflow stopped working after one a few successful runs

zhh210
New Contributor III

Both the mlflow.log_metrics() call and the web UI worked in the first few days but started failing at some point. The log doesn't give any clue why this is happening. It's suspicious but is there a limit of mlflow requests? It's quite annoying that the web UI will stop displaying the finished mlflow runs with no message showing why this is happening.

  File "/home/ec2-user/anaconda3/envs/abr/lib/python3.8/site-packages/mlflow/tracking/fluent.py", line 1050, in get_experiment_by_name
    return MlflowClient().get_experiment_by_name(name)
  File "/home/ec2-user/anaconda3/envs/abr/lib/python3.8/site-packages/mlflow/tracking/client.py", line 451, in get_experiment_by_name
    return self._tracking_client.get_experiment_by_name(name)
  File "/home/ec2-user/anaconda3/envs/abr/lib/python3.8/site-packages/mlflow/tracking/_tracking_service/client.py", line 197, in get_experiment_by_name
    return self.store.get_experiment_by_name(name)
  File "/home/ec2-user/anaconda3/envs/abr/lib/python3.8/site-packages/mlflow/store/tracking/rest_store.py", line 290, in get_experiment_by_name
    response_proto = self._call_endpoint(GetExperimentByName, req_body)
  File "/home/ec2-user/anaconda3/envs/abr/lib/python3.8/site-packages/mlflow/store/tracking/rest_store.py", line 56, in _call_endpoint
    return call_endpoint(self.get_host_creds(), endpoint, method, json_body, response_proto)
  File "/home/ec2-user/anaconda3/envs/abr/lib/python3.8/site-packages/mlflow/utils/rest_utils.py", line 273, in call_endpoint
    response = http_request(
  File "/home/ec2-user/anaconda3/envs/abr/lib/python3.8/site-packages/mlflow/utils/rest_utils.py", line 184, in http_request
    raise MlflowException("API request to %s failed with exception %s" % (url, e))
mlflow.exceptions.MlflowException: API request to https://community.cloud.databricks.com/api/2.0/mlflow/experiments/get-by-name failed with exception HTTPSConnectionPool(host='community.cloud.databricks.com', port=443): Max retries exceeded with url: /api/2.0/mlflow/experiments/get-by-name?experiment_name=%2Fabr-sagemaker-mid (Caused by ResponseError('too many 429 error responses'))

image

2 REPLIES 2

Anonymous
Not applicable

The MLflow slack may be a good place to ask:

zhh210
New Contributor III

It seems Databricks disabled showing existing runs. There's a small pop-up window flashing for half a second saying "This endpoint has temporarily been disabled. Please contact Databricks support". Almost missed it as the window shows up in random manner and is ephemeral.

image

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group