cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Trained model artifact, CI/CD and Databricks without MLFlow.

maranBH
New Contributor III

Hi all,

We are constructing our CI/CD pipelines with the Repos feature following this guide:

https://databricks.com/blog/2021/09/20/part-1-implementing-ci-cd-on-databricks-using-databricks-note...

I'm trying to implement my pipes for models that haven't been trained with the MLFlow paradigm. The code promotion with repos is the same as explained in the link.

However, the model artifact, that is, the serialized file that contains the trained model

on MLFlow travels from its own infrastructure (the MLFlow artifact version control). It points either to the DB FileStore or any other storage previously set.

WITHOUT the MLFlow scheme: what are good practices for promoting the Model Artifact ? Here are my thoughts:

  1. Save it on the data lake, and read the same direction throughout environments. Obstacle: How to add version control here??
  2. When the release pipeline triggers, include it on the Repository artifact.
  3. Any other?

I'm really confused about the solution.

1 ACCEPTED SOLUTION

Accepted Solutions

sean_owen
Honored Contributor II
Honored Contributor II

So you are managing your models with MLflow, and want to include them in a git repository?

You can do that in a CI/CD process; it would run the mlflow CLI to copy the model you want (e.g. model:/my_model/production) to a git checkout and then commit it, perhaps.

You can also treat MLflow as the 'source control' at runtime and have the production jobs refer to and access the latest model from MLflow directly.

Maybe I don't understand what you're trying to achieve here and you can clarify.

View solution in original post

4 REPLIES 4

Kaniz
Community Manager
Community Manager

Hi @Rodrigo Maranzana​ ! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else I will get back to you soon. Thanks.

maranBH
New Contributor III

Hi, thanks @Kaniz Fatma​, is there any advice you can give me for this particular problem?

Anonymous
Not applicable

@Rodrigo Maranzana​ - Thank you for your patience. I am looking for someone to help you.

sean_owen
Honored Contributor II
Honored Contributor II

So you are managing your models with MLflow, and want to include them in a git repository?

You can do that in a CI/CD process; it would run the mlflow CLI to copy the model you want (e.g. model:/my_model/production) to a git checkout and then commit it, perhaps.

You can also treat MLflow as the 'source control' at runtime and have the production jobs refer to and access the latest model from MLflow directly.

Maybe I don't understand what you're trying to achieve here and you can clarify.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.