Databricks Community

maranBH · ‎11-24-2021

Hi all,

We are constructing our CI/CD pipelines with the Repos feature following this guide:

https://databricks.com/blog/2021/09/20/part-1-implementing-ci-cd-on-databricks-using-databricks-note...

I'm trying to implement my pipes for models that haven't been trained with the MLFlow paradigm. The code promotion with repos is the same as explained in the link.

However, the model artifact, that is, the serialized file that contains the trained model

on MLFlow travels from its own infrastructure (the MLFlow artifact version control). It points either to the DB FileStore or any other storage previously set.

WITHOUT the MLFlow scheme: what are good practices for promoting the Model Artifact ? Here are my thoughts:

Save it on the data lake, and read the same direction throughout environments. Obstacle: How to add version control here??
When the release pipeline triggers, include it on the Repository artifact.
Any other?

I'm really confused about the solution.

sean_owen · ‎01-05-2022

So you are managing your models with MLflow, and want to include them in a git repository?

You can do that in a CI/CD process; it would run the mlflow CLI to copy the model you want (e.g. model:/my_model/production) to a git checkout and then commit it, perhaps.

You can also treat MLflow as the 'source control' at runtime and have the production jobs refer to and access the latest model from MLflow directly.

Maybe I don't understand what you're trying to achieve here and you can clarify.

View solution in original post

maranBH · ‎12-01-2021

Hi, thanks @Kaniz Fatma, is there any advice you can give me for this particular problem?

Anonymous · ‎12-09-2021

@Rodrigo Maranzana - Thank you for your patience. I am looking for someone to help you.

sean_owen · ‎01-05-2022

So you are managing your models with MLflow, and want to include them in a git repository?

You can do that in a CI/CD process; it would run the mlflow CLI to copy the model you want (e.g. model:/my_model/production) to a git checkout and then commit it, perhaps.

You can also treat MLflow as the 'source control' at runtime and have the production jobs refer to and access the latest model from MLflow directly.

Maybe I don't understand what you're trying to achieve here and you can clarify.

Databricks Community

Trained model artifact, CI/CD and Databricks without MLFlow.

Connect with Databricks Users in Your Area

Databricks Named a Leader in the 2024 Gartner® Magic Quadrant™ for Cloud Database Management Systems

Announcing the new Meta Llama 3.3 model on Databricks

Milestone: DatabricksTV Reaches 100 Videos!

Dotmatics and Databricks Partner to Advance Scientific Intelligence in Life Sciences

Databricks Community Champion - December 2024 - Sujesh Menon