Hi everybody. I am relatively new to Databricks. I am working on an ML model promotion process between different Databricks workspaces. I am aware that best practice should be deployment as code (e.g. export the whole training pipeline and model registration from dev to prod workspace, either via terraform export or Databricks assets bundle). However, since the model training process is extremely expensive, so we only wanted to promote and register the model to prod workspace and build our inference pipeline there. I have been considering the following options:
Use Open Source MLFlow Export-Import Tool: due to security reasons, this method didn't pass our firm's security review re: the shared dbfs
I am aware that models registered in Databricks Unity Catalog (UC) in the prod workspace can be loaded from dev workspace for model comparison/debugging. But to comply with best practices, we restrict access to assets in UC in the dev workspace from prod workspace.
Use Remote Registry as the mean to share models: this method is part of the legacy way. The current suggestion from Databricks is to use Unity Catalog (UC) for managing ml life cycle
My current workaround this is as follow via github actions:
Use terraform's experimental resource exporter to export the configuration of the model registered in UC in dev workspace. This config gives me the s3 location of this model that i want to promote to prod workspace.
Using the s3 location output from step 1, I can have another workflow to copy the model object from dev's s3 to prod's s3.
Once I have the model copied into prod's s3 bucket, I can then use terraform to register the model (resource) to the prod's UC
I am wondering if there is any simpler way to directly promote an ML object from 1 workspace to another. Thank you very much for your help!