cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Register Model mounted in S3

SOlivero
New Contributor III

Hello!

I'm having an issue registering a model saved in a mounted S3 bucket using mlflow.

Let me give a little bit more context:

1. First I mounted my S3 with all the corresponding IAM permissions:

s3_bucket_name = f"s3a://{s3_bucket}"
dbutils.fs.mount(source=s3_bucket_name, mount_point=f"/mnt/{s3_bucket}")


2. Then I created an experiment pointing to that artifact path

artifact_location=f"dbfs:/mnt/{s3_bucket}/experiments/{model_name}"
mlflow.create_experiment(experiment_name,artifact_location=artifact_location)

3. I started my run and logged my model without registering them. I am logging my model using mlflow and FeatureStoreClient(). (I find the exact same problem with any of the methods of logging the model). 

mlflow.start_run(run_name=run_name)

mlflow.pyfunc.log_model("model_mlflow", python_model=model)

fs = FeatureStoreClient()
fs.log_model(model=model,
artifact_path="model_feature_store",
flavor=flavor,
training_set=training_set)

mlflow.end_run()

To this point, everything is working fine. 

I can find the models in DBFS in:

  • dbfs:/mnt/my-s3-bucket/my-experiment/run_id/artifacts/model_feature_store. 
  • dbfs:/mnt/my-s3-bucket/my-experiment/run_id/artifacts/model_mlflow

I can find the models in my S3 bucket too.

I can access my mlflow model with load_model without any problem from my notebook

model_mlflow = mlflow.pyfunc.load_model(model_uri=f"runs:/{active_run_id}/model_mlflow")
model_mlflow
I can do fs.score_batch() with the Feature Store Client model as well without problem:
fs.score_batch(model_uri=f"runs:/{active_run_id}/model_feature_store", df = df)

The problem comes at the final step when trying to register my model:
If I try to either:

  • Input the parameter of registered_model_name in any of the log_model() functions,
  • Or, Try to register any model with:
    • mlflow.register_model(model_uri=f"runs:/{active_run_id}/model_mlflow",name=model_name) 
  • Or, Try to register my model from the UI with the Register Model button

I get the exact same error:

MlflowException: Model version creation failed for model name: model version: 4 with status: FAILED_REGISTRATION and message:
Failed registration. The given source path `dbfs:/mnt/<my-s3-bucket>/<my-experiment-name>/<run_id>/artifacts/model_mlflow` does not exist.


If I follow the exact same steps but save the experiment in a DBFS path outside the mounted S3 bucket I have no problem at all with the registration step. It only happens when I save the experiment in the S3-mounted bucket inside the S3. 

Why can't I register a model in Databricks when it is saved in dbfs:/mnt/S3-bucket?

Thank you in advance!! 

 

 

 

0 REPLIES 0
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!