cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

What is the difference between mlflow projects and mlflow model?

Anonymous
Not applicable

 They both seem to package it. When should one use one over the other?

3 REPLIES 3

User15787040559
New Contributor III

MLflow Projects are a standard format for packaging reusable data science code. Each project is simply a directory with code or a Git repository, and uses a descriptor file or simply convention to specify its dependencies and how to run the code. For example, projects can contain a 

conda.yaml file for specifying a Python Conda environment. When you use the MLflow Tracking API in a Project, MLflow automatically remembers the project version (for example, Git commit) and any parameters. You can easily run existing MLflow Projects from GitHub or your own Git repository, and chain them into multi-step workflows.

MLflow Models offer a convention for packaging machine learning models in multiple flavors, and a variety of tools to help you deploy them. Each Model is saved as a directory containing arbitrary files and a descriptor file that lists several “flavors” the model can be used in. For example, a TensorFlow model can be loaded as a TensorFlow DAG, or as a Python function to apply to input data. MLflow provides tools to deploy many common model types to diverse platforms: for example, any model supporting the “Python function” flavor can be deployed to a Docker-based REST server, to cloud platforms such as Azure ML and AWS SageMaker, and as a user-defined function in Apache Spark for batch and streaming inference. If you output MLflow Models using the Tracking API, MLflow also automatically remembers which Project and run they came from.

Please see this for more reference

tj-cycyota
New Contributor III
New Contributor III

MLflow Projects - these are a standardized way to package up code related to a specific data science or machine learning "project". For example, if you have a workflow to pre-process data (step 1) and train a model (step 2), you could package this up into a an "MLproject" spec file similar to this. Many organizations use this format to build conformity across disparate teams working on projects, and to ensure projects are repeatable (e.g. model training happens the same way every time) across an entire code base.

MLflow Models - these are the easiest way to abstract the way in which a model was trained from the way it is deployed. For example, you may want to use the latest-and-greatest ML framework (say, Pytorch) but you're not sure how this model will be deployed: batch scoring using Spark? real time API endpoint? Using MLflow Models, when you train a model, it automatically generates multiple "flavors" of how that particular trained model can be deployed. Then you can deploy that however you like (e.g. as a Python function, aka. `pyfunc`) without worrying about the underlying ML framework.

sean_owen
Honored Contributor II
Honored Contributor II

One thing I think it's useful to point out for Databricks users is that you would typically not use MLflow Projects to describe execution of a modeling run. You would just use MLflow directly in Databricks and use Databricks notebooks to manage code and libraries. However you can still execute an MLflow Project against Databricks from outside Databricks.

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!