cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Register Model mounted in S3

SOlivero
New Contributor III

Hello!

I'm having an issue registering a model saved in a mounted S3 bucket using mlflow.

Let me give a little bit more context:

1. First I mounted my S3 with all the corresponding IAM permissions:

s3_bucket_name = f"s3a://{s3_bucket}"
dbutils.fs.mount(source=s3_bucket_name, mount_point=f"/mnt/{s3_bucket}")


2. Then I created an experiment pointing to that artifact path

artifact_location=f"dbfs:/mnt/{s3_bucket}/experiments/{model_name}"
mlflow.create_experiment(experiment_name,artifact_location=artifact_location)

3. I started my run and logged my model without registering them. I am logging my model using mlflow and FeatureStoreClient(). (I find the exact same problem with any of the methods of logging the model). 

mlflow.start_run(run_name=run_name)

mlflow.pyfunc.log_model("model_mlflow", python_model=model)

fs = FeatureStoreClient()
fs.log_model(model=model,
artifact_path="model_feature_store",
flavor=flavor,
training_set=training_set)

mlflow.end_run()

To this point, everything is working fine. 

I can find the models in DBFS in:

  • dbfs:/mnt/my-s3-bucket/my-experiment/run_id/artifacts/model_feature_store. 
  • dbfs:/mnt/my-s3-bucket/my-experiment/run_id/artifacts/model_mlflow

I can find the models in my S3 bucket too.

I can access my mlflow model with load_model without any problem from my notebook

model_mlflow = mlflow.pyfunc.load_model(model_uri=f"runs:/{active_run_id}/model_mlflow")
model_mlflow
I can do fs.score_batch() with the Feature Store Client model as well without problem:
fs.score_batch(model_uri=f"runs:/{active_run_id}/model_feature_store", df = df)

The problem comes at the final step when trying to register my model:
If I try to either:

  • Input the parameter of registered_model_name in any of the log_model() functions,
  • Or, Try to register any model with:
    • mlflow.register_model(model_uri=f"runs:/{active_run_id}/model_mlflow",name=model_name) 
  • Or, Try to register my model from the UI with the Register Model button

I get the exact same error:

MlflowException: Model version creation failed for model name: model version: 4 with status: FAILED_REGISTRATION and message:
Failed registration. The given source path `dbfs:/mnt/<my-s3-bucket>/<my-experiment-name>/<run_id>/artifacts/model_mlflow` does not exist.


If I follow the exact same steps but save the experiment in a DBFS path outside the mounted S3 bucket I have no problem at all with the registration step. It only happens when I save the experiment in the S3-mounted bucket inside the S3. 

Why can't I register a model in Databricks when it is saved in dbfs:/mnt/S3-bucket?

Thank you in advance!! 

 

 

 

0 REPLIES 0

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group