cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

What is the difference between mlflow projects and mlflow model?

Anonymous
Not applicable

 They both seem to package it. When should one use one over the other?

3 REPLIES 3

User15787040559
New Contributor III

MLflow Projects are a standard format for packaging reusable data science code. Each project is simply a directory with code or a Git repository, and uses a descriptor file or simply convention to specify its dependencies and how to run the code. For example, projects can contain a 

conda.yaml file for specifying a Python Conda environment. When you use the MLflow Tracking API in a Project, MLflow automatically remembers the project version (for example, Git commit) and any parameters. You can easily run existing MLflow Projects from GitHub or your own Git repository, and chain them into multi-step workflows.

MLflow Models offer a convention for packaging machine learning models in multiple flavors, and a variety of tools to help you deploy them. Each Model is saved as a directory containing arbitrary files and a descriptor file that lists several โ€œflavorsโ€ the model can be used in. For example, a TensorFlow model can be loaded as a TensorFlow DAG, or as a Python function to apply to input data. MLflow provides tools to deploy many common model types to diverse platforms: for example, any model supporting the โ€œPython functionโ€ flavor can be deployed to a Docker-based REST server, to cloud platforms such as Azure ML and AWS SageMaker, and as a user-defined function in Apache Spark for batch and streaming inference. If you output MLflow Models using the Tracking API, MLflow also automatically remembers which Project and run they came from.

Please see this for more reference

tj-cycyota
New Contributor III

MLflow Projects - these are a standardized way to package up code related to a specific data science or machine learning "project". For example, if you have a workflow to pre-process data (step 1) and train a model (step 2), you could package this up into a an "MLproject" spec file similar to this. Many organizations use this format to build conformity across disparate teams working on projects, and to ensure projects are repeatable (e.g. model training happens the same way every time) across an entire code base.

MLflow Models - these are the easiest way to abstract the way in which a model was trained from the way it is deployed. For example, you may want to use the latest-and-greatest ML framework (say, Pytorch) but you're not sure how this model will be deployed: batch scoring using Spark? real time API endpoint? Using MLflow Models, when you train a model, it automatically generates multiple "flavors" of how that particular trained model can be deployed. Then you can deploy that however you like (e.g. as a Python function, aka. `pyfunc`) without worrying about the underlying ML framework.

sean_owen
Honored Contributor II

One thing I think it's useful to point out for Databricks users is that you would typically not use MLflow Projects to describe execution of a modeling run. You would just use MLflow directly in Databricks and use Databricks notebooks to manage code and libraries. However you can still execute an MLflow Project against Databricks from outside Databricks.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group