cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

mlflow RESOURCE_ALREADY_EXISTS

mangeldfz
New Contributor III

I tried to log some run in my Databricks Workspace and I'm facing the following error: RESOURCE_ALREADY_EXISTS when I try to log any run.

I could replicate the error with the following code:

import mlflow
import mlflow.sklearn
from mlflow.tracking import MlflowClient
 
mlflow.set_experiment('/learning/Mlflow-Full-Example/test-mlflow')
 
with mlflow.start_run(run_name='silly_run-test') as run:
  mlflow.log_param('seed', 777)

The error is the following, I don't know what to do about the conflict with that AML experiment

image.pngIn case the error image has not quality enough here is the full message:

RestException: RESOURCE_ALREADY_EXISTS: Failed to create AML experiment for experiment id=1823487114958629, name=/learning/Mlflow-Full-Example/test-mlflow, artifactLocation=dbfs:/databricks/mlflow-tracking/1823487114958629. There is an existing AML experiment with id=fa0eed6c-afd5-458b-9835-88903b535e04 and name='/adb/6432554542138879/1823487114958629/learning/Mlflow-Full-Example/test-mlflow' and artifactLocation='' that is not compatible.

1 ACCEPTED SOLUTION

Accepted Solutions

Prabakar
Esteemed Contributor III
Esteemed Contributor III

Hi @Miguel Ángel Fernández​  it’s not recommended to “link” the Databricks and AML workspaces, as we are seeing more problems. You can refer to the instructions found below for using MLflow with AML.   https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-mlflow

You can refer to https://github.com/MicrosoftDocs/azure-docs/issues/80298 to unlink.

View solution in original post

8 REPLIES 8

Kaniz
Community Manager
Community Manager

Hi @ mangeldfz! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else I will get back to you soon. Thanks.

mangeldfz
New Contributor III

Hi Kaniz, thanks for your comment. I found another folk in the internet with the same problem starting a few days ago. So, I think it has nothing to do with my Workspace. Hope this will be solved soon, this is stopping all our machine learning developments.

Hubert-Dudek
Esteemed Contributor III

it seems like name conflict can you just rename to something different than test-mlflow.

You can also try to clean directories if there is nothing important (but I am not sure is/ adb on dbfs storage):

dbutils.fs.rm("/databricks/mlflow-tracking/1823487114958629", recurse=True)
dbutils.fs.rm("/adb/6432554542138879/1823487114958629/learning/Mlflow-Full-Example/test-mlflow", recurse=True)

I tried renaming the experiment name and the run_name and it does not work, the error keeps the same. When I search for the experiment it is conflicting I can find the AML id using client.list_experiments(), this is the Experiment I have conflict, but it seems the conflict has to do with the AML part:

Experiment: artifact_location='dbfs:/databricks/mlflow-tracking/2288118769165005', experiment_id='2288118769165005', lifecycle_stage='active', name='/learning/Mlflow-Full-Example/test-mlflow-renamed2', tags={'mlflow.AML_EXPERIMENT_ID': '594197a2-c16e-4e14-8040-e398833198ff',

'mlflow.experimentType': 'MLFLOW_EXPERIMENT',

'mlflow.ownerEmail': '***.***@***.***',

'mlflow.ownerId': 'YYYYYY'}

I can delete the whole experiment using the UI of Experiments if I try to delete any experiment using mlflow.delete_experiment() I get the same error of the beginning. Nevertheless, it does not work. Also, I can not find the /adb directory anywhere, it is not in the DBFS.

It seems that for every experiment I create, mlflow creates also a AML experiment associated and all AML experiments are pointing to the same artifactLocation="" by default. It does not matter if you delete all experiments using the UI, the garbage collector detects that there is (or there was) a experiment (an AML experiment) with artifactLocation="", so there is a conflict for any new experiment you try to log things in.

Prabakar
Esteemed Contributor III
Esteemed Contributor III

Hi @Miguel Ángel Fernández​  it’s not recommended to “link” the Databricks and AML workspaces, as we are seeing more problems. You can refer to the instructions found below for using MLflow with AML.   https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-mlflow

You can refer to https://github.com/MicrosoftDocs/azure-docs/issues/80298 to unlink.

mangeldfz
New Contributor III

Hi Prabakar, thank you so much for your response. Finally, we decided to delete the Azure Machine Learning service because the ARM in the reference that you provide throws the following error:

error-arm-template-deploy 

I wonder if just redeploying the Azure Machine Learning service in the same resource group will be enough to set up both services properly or will be again a linking between them. I count with no mlflow communication between Databricks and the new Azure Machine Learning, of course.

Anonymous
Not applicable

Hi!

I am facing the same problems with linked WS and wonder if you managed to find a solution to your problem by unlinking the spaces.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.