10-17-2023 12:19 PM
Here is my model logging code.
mlflow.set_registry_uri("databricks-uc") with mlflow.start_run() as run: mlflow.transformers.log_model( transformers_model=pipeline, artifact_path="gpt2", registered_model_name=registered_model_name, input_example=input_example, signature=signature, task="text-generation", inference_config = inference_config, await_registration_for=60 * 60, )
And here is my registration code:
mlflow.set_registry_uri("databricks-uc") mlflow.set_tracking_uri("databricks") result = mlflow.register_model( model_uri="runs:/"+run.info.run_id+"/model", name=registered_name, await_registration_for=1000, )
Here is the full traceback, lightly edited.
MlflowException Traceback (most recent call last) File /local_disk0/.ephemeral_nfs/envs/pythonEnv-a56b0856-4b58-4270-93c1-f4e3d186cf4a/lib/python3.10/site-packages/mlflow/store/_unity_catalog/registry/rest_store.py:483, in UcModelRegistryStore._local_model_dir(self, source, local_model_path) 482 try: --> 483 local_model_dir = mlflow.artifacts.download_artifacts( 484 artifact_uri=source, tracking_uri=self.tracking_uri 485 ) 486 except Exception as e: File /local_disk0/.ephemeral_nfs/envs/pythonEnv-a56b0856-4b58-4270-93c1-f4e3d186cf4a/lib/python3.10/site-packages/mlflow/artifacts/__init__.py:60, in download_artifacts(artifact_uri, run_id, artifact_path, dst_path, tracking_uri) 59 if artifact_uri is not None: ---> 60 return _download_artifact_from_uri(artifact_uri, output_path=dst_path) 62 artifact_path = artifact_path if artifact_path is not None else "" File /local_disk0/.ephemeral_nfs/envs/pythonEnv-a56b0856-4b58-4270-93c1-f4e3d186cf4a/lib/python3.10/site-packages/mlflow/tracking/artifact_utils.py:100, in _download_artifact_from_uri(artifact_uri, output_path) 99 root_uri, artifact_path = _get_root_uri_and_artifact_path(artifact_uri) --> 100 return get_artifact_repository(artifact_uri=root_uri).download_artifacts( 101 artifact_path=artifact_path, dst_path=output_path 102 ) File /local_disk0/.ephemeral_nfs/envs/pythonEnv-a56b0856-4b58-4270-93c1-f4e3d186cf4a/lib/python3.10/site-packages/mlflow/store/artifact/artifact_repo.py:221, in ArtifactRepository.download_artifacts(self, artifact_path, dst_path) 218 failures = "\n".join( 219 template.format(path=path, error=error) for path, error in failed_downloads.items() 220 ) --> 221 raise MlflowException( 222 message=( 223 "The following failures occurred while downloading one or more" 224 f" artifacts from {self.artifact_uri}:\n{_truncate_error(failures)}" 225 ) 226 ) 228 return os.path.join(dst_path, artifact_path) MlflowException: The following failures occurred while downloading one or more artifacts from dbfs:/databricks/mlflow-tracking/.../artifacts: ##### File model ##### 404 Client Error: Not Found for url: https://$DATABRICKSURL/8188181812650195.jobs/mlflow-tracking/2982154088058434/a1cecbee2f8441c09f3fbe5d7a7587ff/artifacts/model... Response text: <?xml version="1.0" encoding="UTF-8"?> <Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><Key>$PATH/8188181812650195.jobs/mlflow-tracking/2982154088058434/a1cecbee2f8441c09f3fbe5d7a7587ff/artifacts/model</Key><RequestId>$REQUESTID</RequestId><HostId>$HOSTID</HostId></Error> The above exception was the direct cause of the following exception: MlflowException Traceback (most recent call last) File <command-2982154088058438>, line 76 ---> 76 result = mlflow.register_model( 77 "runs:/"+run.info.run_id+"/model", 78 name=registered_name, 79 await_registration_for=1000, 80 ) 82 from mlflow import MlflowClient 83 client = MlflowClient(registry_uri="databricks-uc") File /local_disk0/.ephemeral_nfs/envs/pythonEnv-a56b0856-4b58-4270-93c1-f4e3d186cf4a/lib/python3.10/site-packages/mlflow/tracking/_model_registry/fluent.py:73, in register_model(model_uri, name, await_registration_for, tags) 17 def register_model( 18 model_uri, 19 name, (...) 22 tags: Optional[Dict[str, Any]] = None, 23 ) -> ModelVersion: 24 """ 25 Create a new model version in model registry for the model files specified by ``model_uri``. 26 Note that this method assumes the model registry backend URI is the same as that of the (...) 71 Version: 1 72 """ ---> 73 return _register_model( 74 model_uri=model_uri, name=name, await_registration_for=await_registration_for, tags=tags 75 ) File /local_disk0/.ephemeral_nfs/envs/pythonEnv-a56b0856-4b58-4270-93c1-f4e3d186cf4a/lib/python3.10/site-packages/mlflow/tracking/_model_registry/fluent.py:108, in _register_model(model_uri, name, await_registration_for, tags, local_model_path) 105 source = RunsArtifactRepository.get_underlying_uri(model_uri) 106 (run_id, _) = RunsArtifactRepository.parse_runs_uri(model_uri) --> 108 create_version_response = client._create_model_version( 109 name=name, 110 source=source, 111 run_id=run_id, 112 tags=tags, 113 await_creation_for=await_registration_for, 114 local_model_path=local_model_path, 115 ) 116 eprint( 117 f"Created version '{create_version_response.version}' of model " 118 f"'{create_version_response.name}'." 119 ) 120 return create_version_response File /local_disk0/.ephemeral_nfs/envs/pythonEnv-a56b0856-4b58-4270-93c1-f4e3d186cf4a/lib/python3.10/site-packages/mlflow/tracking/client.py:2575, in MlflowClient._create_model_version(self, name, source, run_id, tags, run_link, description, await_creation_for, local_model_path) 2567 # NOTE: we can't easily delete the target temp location due to the async nature 2568 # of the model version creation - printing to let the user know. 2569 eprint( 2570 f"=== Source model files were copied to {new_source}" 2571 + " in the model registry workspace. You may want to delete the files once the" 2572 + " model version is in 'READY' status. You can also find this location in the" 2573 + " `source` field of the created model version. ===" 2574 ) -> 2575 return self._get_registry_client().create_model_version( 2576 name=name, 2577 source=new_source, 2578 run_id=run_id, 2579 tags=tags, 2580 run_link=run_link, 2581 description=description, 2582 await_creation_for=await_creation_for, 2583 local_model_path=local_model_path, 2584 ) File /local_disk0/.ephemeral_nfs/envs/pythonEnv-a56b0856-4b58-4270-93c1-f4e3d186cf4a/lib/python3.10/site-packages/mlflow/tracking/_model_registry/client.py:196, in ModelRegistryClient.create_model_version(self, name, source, run_id, tags, run_link, description, await_creation_for, local_model_path) 194 arg_names = _get_arg_names(self.store.create_model_version) 195 if "local_model_path" in arg_names: --> 196 mv = self.store.create_model_version( 197 name, 198 source, 199 run_id, 200 tags, 201 run_link, 202 description, 203 local_model_path=local_model_path, 204 ) 205 else: 206 # Fall back to calling create_model_version without 207 # local_model_path since old model registry store implementations may not 208 # support the local_model_path argument. 209 mv = self.store.create_model_version(name, source, run_id, tags, run_link, description) File /local_disk0/.ephemeral_nfs/envs/pythonEnv-a56b0856-4b58-4270-93c1-f4e3d186cf4a/lib/python3.10/site-packages/mlflow/store/_unity_catalog/registry/rest_store.py:545, in UcModelRegistryStore.create_model_version(self, name, source, run_id, tags, run_link, description, local_model_path) 543 extra_headers = {_DATABRICKS_LINEAGE_ID_HEADER: header_base64} 544 full_name = get_full_name_from_sc(name, self.spark) --> 545 with self._local_model_dir(source, local_model_path) as local_model_dir: 546 self._validate_model_signature(local_model_dir) 547 feature_deps = get_feature_dependencies(local_model_dir) File /usr/lib/python3.10/contextlib.py:135, in _GeneratorContextManager.__enter__(self) 133 del self.args, self.kwds, self.func 134 try: --> 135 return next(self.gen) 136 except StopIteration: 137 raise RuntimeError("generator didn't yield") from None File /local_disk0/.ephemeral_nfs/envs/pythonEnv-a56b0856-4b58-4270-93c1-f4e3d186cf4a/lib/python3.10/site-packages/mlflow/store/_unity_catalog/registry/rest_store.py:487, in UcModelRegistryStore._local_model_dir(self, source, local_model_path) 483 local_model_dir = mlflow.artifacts.download_artifacts( 484 artifact_uri=source, tracking_uri=self.tracking_uri 485 ) 486 except Exception as e: --> 487 raise MlflowException( 488 f"Unable to download model artifacts from source artifact location " 489 f"'{source}' in order to upload them to Unity Catalog. Please ensure " 490 f"the source artifact location exists and that you can download from " 491 f"it via mlflow.artifacts.download_artifacts()" 492 ) from e 493 # Clean up temporary model directory at end of block. We assume a temporary 494 # model directory was created if the `source` is not a local path (must be downloaded 495 # from remote to a temporary directory) 496 yield local_model_dir MlflowException: Unable to download model artifacts from source artifact location 'dbfs:/databricks/mlflow-tracking/2982154088058434/a1cecbee2f8441c09f3fbe5d7a7587ff/artifacts/model' in order to upload them to Unity Catalog. Please ensure the source artifact location exists and that you can download from it via mlflow.artifacts.download_artifacts()
When I open the DBFS file browser, I don't see any folder called 'databricks', so I decided to look through it with terminal commands. When I run %ls /dbfs/databricks/ I can see two directories: mlflow-registry and mlflow-tracking. When I run `%ls /dbfs/databricks/mlflow-tracking/` or %ls /dbfs/databricks/mlflow-registry/ though I get this error: mount.err*. Granted, I didn't try this with a Unity Catalog enabled cluster, but I don't think I need one to browse through DBFS. Also, at no point in the process do I mount a directory, but we are using Databricks through AWS, so that connection is probably where things are going wrong. I then tried using the full path straight from the error message: %ls /dbfs/databricks/mlflow-tracking/2982154088058434/a1cecbee2f8441c09f3fbe5d7a7587ff/artifacts/model and I got the error: ls: cannot access '/dbfs/databricks/mlflow-tracking/2982154088058434/a1cecbee2f8441c09f3fbe5d7a7587ff/artifacts/model': No such file or directory which suggests that perhaps the filepath actually does not exist after all! From here though I'm at a loss from what to do. I followed the Databricks example code located here and it worked, but for my model things get wonky. I am all out of ideas from where to go from here, so I'd really appreciate any and all tips.
10-20-2023 04:28 AM
Hi @AChang, I suggest creating a new Databricks cluster and running your code to see if the issue is specific to your current cluster configuration.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group