Databricks

Anonymous · ‎06-21-2021

They both seem to package it. When should one use one over the other?

User15787040559 · ‎06-21-2021

MLflow Projects are a standard format for packaging reusable data science code. Each project is simply a directory with code or a Git repository, and uses a descriptor file or simply convention to specify its dependencies and how to run the code. For example, projects can contain a

conda.yaml file for specifying a Python Conda environment. When you use the MLflow Tracking API in a Project, MLflow automatically remembers the project version (for example, Git commit) and any parameters. You can easily run existing MLflow Projects from GitHub or your own Git repository, and chain them into multi-step workflows.

MLflow Models offer a convention for packaging machine learning models in multiple flavors, and a variety of tools to help you deploy them. Each Model is saved as a directory containing arbitrary files and a descriptor file that lists several “flavors” the model can be used in. For example, a TensorFlow model can be loaded as a TensorFlow DAG, or as a Python function to apply to input data. MLflow provides tools to deploy many common model types to diverse platforms: for example, any model supporting the “Python function” flavor can be deployed to a Docker-based REST server, to cloud platforms such as Azure ML and AWS SageMaker, and as a user-defined function in Apache Spark for batch and streaming inference. If you output MLflow Models using the Tracking API, MLflow also automatically remembers which Project and run they came from.

Please see this for more reference

User16776431030 · ‎06-21-2021

MLflow Projects - these are a standardized way to package up code related to a specific data science or machine learning "project". For example, if you have a workflow to pre-process data (step 1) and train a model (step 2), you could package this up into a an "MLproject" spec file similar to this. Many organizations use this format to build conformity across disparate teams working on projects, and to ensure projects are repeatable (e.g. model training happens the same way every time) across an entire code base.

MLflow Models - these are the easiest way to abstract the way in which a model was trained from the way it is deployed. For example, you may want to use the latest-and-greatest ML framework (say, Pytorch) but you're not sure how this model will be deployed: batch scoring using Spark? real time API endpoint? Using MLflow Models, when you train a model, it automatically generates multiple "flavors" of how that particular trained model can be deployed. Then you can deploy that however you like (e.g. as a Python function, aka. `pyfunc`) without worrying about the underlying ML framework.

sean_owen · ‎06-22-2021

One thing I think it's useful to point out for Databricks users is that you would typically not use MLflow Projects to describe execution of a modeling run. You would just use MLflow directly in Databricks and use Databricks notebooks to manage code and libraries. However you can still execute an MLflow Project against Databricks from outside Databricks.

Databricks

What is the difference between mlflow projects and mlflow model?

Registration now open! Databricks Data + AI Summit 2024

Meet DBRX, the New Standard for High-Quality LLMs

Data Warehousing in the Era of AI