cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Trained model artifact, CI/CD and Databricks without MLFlow.

maranBH
New Contributor III

Hi all,

We are constructing our CI/CD pipelines with the Repos feature following this guide:

https://databricks.com/blog/2021/09/20/part-1-implementing-ci-cd-on-databricks-using-databricks-note...

I'm trying to implement my pipes for models that haven't been trained with the MLFlow paradigm. The code promotion with repos is the same as explained in the link.

However, the model artifact, that is, the serialized file that contains the trained model

on MLFlow travels from its own infrastructure (the MLFlow artifact version control). It points either to the DB FileStore or any other storage previously set.

WITHOUT the MLFlow scheme: what are good practices for promoting the Model Artifact ? Here are my thoughts:

  1. Save it on the data lake, and read the same direction throughout environments. Obstacle: How to add version control here??
  2. When the release pipeline triggers, include it on the Repository artifact.
  3. Any other?

I'm really confused about the solution.

1 ACCEPTED SOLUTION

Accepted Solutions

sean_owen
Databricks Employee
Databricks Employee

So you are managing your models with MLflow, and want to include them in a git repository?

You can do that in a CI/CD process; it would run the mlflow CLI to copy the model you want (e.g. model:/my_model/production) to a git checkout and then commit it, perhaps.

You can also treat MLflow as the 'source control' at runtime and have the production jobs refer to and access the latest model from MLflow directly.

Maybe I don't understand what you're trying to achieve here and you can clarify.

View solution in original post

3 REPLIES 3

maranBH
New Contributor III

Hi, thanks @Kaniz Fatma​, is there any advice you can give me for this particular problem?

Anonymous
Not applicable

@Rodrigo Maranzana​ - Thank you for your patience. I am looking for someone to help you.

sean_owen
Databricks Employee
Databricks Employee

So you are managing your models with MLflow, and want to include them in a git repository?

You can do that in a CI/CD process; it would run the mlflow CLI to copy the model you want (e.g. model:/my_model/production) to a git checkout and then commit it, perhaps.

You can also treat MLflow as the 'source control' at runtime and have the production jobs refer to and access the latest model from MLflow directly.

Maybe I don't understand what you're trying to achieve here and you can clarify.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group