Databricks Community

j_b · ‎09-06-2022

I'm trying out managed MLflow on Databricks Community edition, with tracking data saved on Databricks and artifacts saved on my own AWS S3 bucket.

I created one experiment and logged 768 runs in the experiment. When I try to get the list of the runs with list_run_infos method, the return maxes out at 399 instead of 768. Is this a limit imposed on Community Edition?

Code:

from mlflow.tracking import MlflowClient
from mlflow.entities import ViewType
client = MlflowClient()   
exp_id = client.get_experiment_by_name("exp_name").experiment_id
load_max = 10000
 
run_list = client.list_run_infos(
                          experiment_id=exp_id, 
                          run_view_type=ViewType.ACTIVE_ONLY, 
                          max_results=load_max
) 
print(len(run_list))
399

sean_owen · ‎09-19-2022

Are 768 of them 'active'? this lists only active runs, according to the method call here.

Note that you should get a paginated result from this method. I am not sure that's the issue here, but the result is not going to be all results.

I don't believe there is otherwise a limit here.

Finally, related, this method is deprecated in favor of search_runs anyway, note.

Anonymous · ‎09-22-2022

Hi @jae baak

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help.

We'd love to hear from you.

Thanks!