11-17-2021 01:13 AM
I tried to log some run in my Databricks Workspace and I'm facing the following error: RESOURCE_ALREADY_EXISTS when I try to log any run.
I could replicate the error with the following code:
import mlflow
import mlflow.sklearn
from mlflow.tracking import MlflowClient
mlflow.set_experiment('/learning/Mlflow-Full-Example/test-mlflow')
with mlflow.start_run(run_name='silly_run-test') as run:
mlflow.log_param('seed', 777)
The error is the following, I don't know what to do about the conflict with that AML experiment
In case the error image has not quality enough here is the full message:
RestException: RESOURCE_ALREADY_EXISTS: Failed to create AML experiment for experiment id=1823487114958629, name=/learning/Mlflow-Full-Example/test-mlflow, artifactLocation=dbfs:/databricks/mlflow-tracking/1823487114958629. There is an existing AML experiment with id=fa0eed6c-afd5-458b-9835-88903b535e04 and name='/adb/6432554542138879/1823487114958629/learning/Mlflow-Full-Example/test-mlflow' and artifactLocation='' that is not compatible.
11-17-2021 08:10 AM
Hi @Miguel Ángel Fernández it’s not recommended to “link” the Databricks and AML workspaces, as we are seeing more problems. You can refer to the instructions found below for using MLflow with AML. https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-mlflow
You can refer to https://github.com/MicrosoftDocs/azure-docs/issues/80298 to unlink.
11-17-2021 03:43 AM
Hi Kaniz, thanks for your comment. I found another folk in the internet with the same problem starting a few days ago. So, I think it has nothing to do with my Workspace. Hope this will be solved soon, this is stopping all our machine learning developments.
11-17-2021 04:10 AM
it seems like name conflict can you just rename to something different than test-mlflow.
You can also try to clean directories if there is nothing important (but I am not sure is/ adb on dbfs storage):
dbutils.fs.rm("/databricks/mlflow-tracking/1823487114958629", recurse=True)
dbutils.fs.rm("/adb/6432554542138879/1823487114958629/learning/Mlflow-Full-Example/test-mlflow", recurse=True)
11-17-2021 04:49 AM
I tried renaming the experiment name and the run_name and it does not work, the error keeps the same. When I search for the experiment it is conflicting I can find the AML id using client.list_experiments(), this is the Experiment I have conflict, but it seems the conflict has to do with the AML part:
Experiment: artifact_location='dbfs:/databricks/mlflow-tracking/2288118769165005', experiment_id='2288118769165005', lifecycle_stage='active', name='/learning/Mlflow-Full-Example/test-mlflow-renamed2', tags={'mlflow.AML_EXPERIMENT_ID': '594197a2-c16e-4e14-8040-e398833198ff',
'mlflow.experimentType': 'MLFLOW_EXPERIMENT',
'mlflow.ownerEmail': '***.***@***.***',
'mlflow.ownerId': 'YYYYYY'}
I can delete the whole experiment using the UI of Experiments if I try to delete any experiment using mlflow.delete_experiment() I get the same error of the beginning. Nevertheless, it does not work. Also, I can not find the /adb directory anywhere, it is not in the DBFS.
11-17-2021 05:01 AM
It seems that for every experiment I create, mlflow creates also a AML experiment associated and all AML experiments are pointing to the same artifactLocation="" by default. It does not matter if you delete all experiments using the UI, the garbage collector detects that there is (or there was) a experiment (an AML experiment) with artifactLocation="", so there is a conflict for any new experiment you try to log things in.
11-17-2021 08:10 AM
Hi @Miguel Ángel Fernández it’s not recommended to “link” the Databricks and AML workspaces, as we are seeing more problems. You can refer to the instructions found below for using MLflow with AML. https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-mlflow
You can refer to https://github.com/MicrosoftDocs/azure-docs/issues/80298 to unlink.
11-18-2021 04:22 AM
Hi Prabakar, thank you so much for your response. Finally, we decided to delete the Azure Machine Learning service because the ARM in the reference that you provide throws the following error:
I wonder if just redeploying the Azure Machine Learning service in the same resource group will be enough to set up both services properly or will be again a linking between them. I count with no mlflow communication between Databricks and the new Azure Machine Learning, of course.
12-01-2021 08:01 AM
Hi!
I am facing the same problems with linked WS and wonder if you managed to find a solution to your problem by unlinking the spaces.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group