cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

ML model promotion from Databricks dev workspace to prod workspace

datastones
Contributor

Hi everybody. I am relatively new to Databricks. I am working on an ML model promotion process between different Databricks workspaces. I am aware that best practice should be deployment as code (e.g. export the whole training pipeline and model registration from dev to prod workspace, either via terraform export or Databricks assets bundle). However, since the model training process is extremely expensive, so we only wanted to promote and register the model to prod workspace and build our inference pipeline there. I have been considering the following options:

  • Use Open Source MLFlow Export-Import Tool: due to security reasons, this method didn't pass our firm's security review re: the shared dbfs

  • I am aware that models registered in Databricks Unity Catalog (UC) in the prod workspace can be loaded from dev workspace for model comparison/debugging. But to comply with best practices, we restrict access to assets in UC in the dev workspace from prod workspace.

  • Use Remote Registry as the mean to share models: this method is part of the legacy way. The current suggestion from Databricks is to use Unity Catalog (UC) for managing ml life cycle

My current workaround this is as follow via github actions:

  1. Use terraform's experimental resource exporter to export the configuration of the model registered in UC in dev workspace. This config gives me the s3 location of this model that i want to promote to prod workspace.

  2. Using the s3 location output from step 1, I can have another workflow to copy the model object from dev's s3 to prod's s3.

  3. Once I have the model copied into prod's s3 bucket, I can then use terraform to register the model (resource) to the prod's UC

I am wondering if there is any simpler way to directly promote an ML object from 1 workspace to another. Thank you very much for your help!

2 ACCEPTED SOLUTIONS

Accepted Solutions

amr
Databricks Employee
Databricks Employee
  • I am aware that models registered in Databricks Unity Catalog (UC) in the prod workspace can be loaded from dev workspace for model comparison/debugging. But to comply with best practices, we restrict access to assets in UC in the dev workspace from prod workspace.


it should be the opposit, the prod workspace, can see the catalogs of the dev workspace, or even better, push these models to a special catalog model_registery.models and then make this visible in dev and prod and contains nothing but the models, no data.

View solution in original post

Hi amr, thank you very much for your input, having a single UC for the models, as you suggested, with appropriate tags and alias seems to be something that I could try. I posted the same question on reddit and they shared the same concensus re: having a dedicated UC for the models.

https://www.reddit.com/r/databricks/comments/1dtur0z/ml_model_promotion_from_databricks_dev_workspac...

View solution in original post

2 REPLIES 2

amr
Databricks Employee
Databricks Employee
  • I am aware that models registered in Databricks Unity Catalog (UC) in the prod workspace can be loaded from dev workspace for model comparison/debugging. But to comply with best practices, we restrict access to assets in UC in the dev workspace from prod workspace.


it should be the opposit, the prod workspace, can see the catalogs of the dev workspace, or even better, push these models to a special catalog model_registery.models and then make this visible in dev and prod and contains nothing but the models, no data.

Hi amr, thank you very much for your input, having a single UC for the models, as you suggested, with appropriate tags and alias seems to be something that I could try. I posted the same question on reddit and they shared the same concensus re: having a dedicated UC for the models.

https://www.reddit.com/r/databricks/comments/1dtur0z/ml_model_promotion_from_databricks_dev_workspac...

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group