<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic docs.databricks.com in Machine Learning</title>
    <link>https://community.databricks.com/t5/machine-learning/docs-databricks-com/m-p/13842#M738</link>
    <description>&lt;P&gt;&lt;B&gt;&lt;U&gt;2021-09 webinar: Automating the ML Lifecycle With Databricks Machine Learning (post 1 of 2)&lt;/U&gt;&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thank you to everyone who joined the Automating the ML Lifecycle With Databricks Machine Learning webinar!&amp;nbsp;You can access the &lt;A href="https://databricks.com/p/webinar/automating-the-ml-lifecycle-with-databricks-machine-learning?itm_data=product-resources-automatingMLlifecycle" alt="https://databricks.com/p/webinar/automating-the-ml-lifecycle-with-databricks-machine-learning?itm_data=product-resources-automatingMLlifecycle" target="_blank"&gt;&lt;U&gt;on-demand recording here&lt;/U&gt;&lt;/A&gt; and the &lt;A href="https://github.com/RafiKurlansik/dais2021_full_ml_lifecycle" alt="https://github.com/RafiKurlansik/dais2021_full_ml_lifecycle" target="_blank"&gt;&lt;U&gt;code in this Github repo&lt;/U&gt;&lt;/A&gt;.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;We're sharing a subset of the questions asked and answered throughout the session, as well as the links to resources in the last slide of the webinar. Please feel free to ask follow-up questions or add comments as threads.&amp;nbsp;Due to length limits on Community posts, we’ll split this in two.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;Databricks ML&lt;/B&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;How can I enable the Databricks ML workspace?&lt;UL&gt;&lt;LI&gt;In your Databricks workspace, you should be able to see a selector in the upper-left.&amp;nbsp;There's a GIF of selecting it here: &lt;A href="https://docs.databricks.com/applications/machine-learning/index.html" alt="https://docs.databricks.com/applications/machine-learning/index.html" target="_blank"&gt;&lt;U&gt;&lt;/U&gt;&lt;/A&gt;&lt;A href="https://docs.databricks.com/applications/machine-learning/index.html" target="test_blank"&gt;https://docs.databricks.com/applications/machine-learning/index.html&lt;/A&gt;&amp;nbsp;You can also pin a particular “persona” or workspace view to be your default.&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;How can I get started with Databricks ML?&lt;UL&gt;&lt;LI&gt;If you want guided tutorials, then the Databricks Academy has great resources, especially its recommended learning path for Data Scientists: &lt;A href="https://academy.databricks.com/data-scientist" alt="https://academy.databricks.com/data-scientist" target="_blank"&gt;&lt;U&gt;&lt;/U&gt;&lt;/A&gt;&lt;A href="https://academy.databricks.com/data-scientist" target="test_blank"&gt;https://academy.databricks.com/data-scientist&lt;/A&gt;&amp;nbsp;These resources are free for customers; contact support if you have trouble accessing them.&lt;/LI&gt;&lt;LI&gt;If there is a particular task you want to do, start from our documentation &lt;A href="https://docs.databricks.com/applications/machine-learning/index.html" alt="https://docs.databricks.com/applications/machine-learning/index.html" target="_blank"&gt;&lt;U&gt;&lt;/U&gt;&lt;/A&gt;&lt;A href="https://docs.databricks.com/applications/machine-learning/index.html" target="test_blank"&gt;https://docs.databricks.com/applications/machine-learning/index.html&lt;/A&gt; to find the right page, and look for examples and code for that task.&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;AutoML&lt;/B&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;How does your AutoML compare with other enterprise AutoML approaches?&lt;UL&gt;&lt;LI&gt;I'd say the highest level bit is that Databricks AutoML takes a "glass-box" approach, generating notebooks for every model it fits.&amp;nbsp;That allows you to clone and modify the code to further iterate on the models.&amp;nbsp;In general, all AutoML solutions generate pretty good results---but not as good as models with more expert knowledge incorporated.&amp;nbsp;This code generation approach lets data scientists get a reasonable model quickly and then incorporate their domain expertise to improve the model further.&amp;nbsp;For a good intro to it, I'd recommend checking out the Data AI Summit 2021 keynote on Databricks ML: &lt;A href="https://youtu.be/zQEiwJqqeeA" alt="https://youtu.be/zQEiwJqqeeA" target="_blank"&gt;&lt;U&gt;&lt;/U&gt;&lt;/A&gt;&lt;A href="https://youtu.be/zQEiwJqqeeA" target="test_blank"&gt;https://youtu.be/zQEiwJqqeeA&lt;/A&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;General MLflow&lt;/B&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;What support do MLflow and Databricks have for R?&lt;UL&gt;&lt;LI&gt;MLflow has native support for R.&amp;nbsp;You can find the R API docs here: &lt;A href="https://mlflow.org/docs/latest/R-api.html" alt="https://mlflow.org/docs/latest/R-api.html" target="_blank"&gt;&lt;U&gt;&lt;/U&gt;&lt;/A&gt;&lt;A href="https://mlflow.org/docs/latest/R-api.html" target="test_blank"&gt;https://mlflow.org/docs/latest/R-api.html&lt;/A&gt;&lt;/LI&gt;&lt;LI&gt;If you're working within Databricks, then Databricks Runtimes provide R and many common packages out-of-the-box.&amp;nbsp;We generally recommend using the &lt;A href="https://docs.databricks.com/runtime/mlruntime.html" alt="https://docs.databricks.com/runtime/mlruntime.html" target="_blank"&gt;&lt;U&gt;Databricks Runtime for Machine Learning&lt;/U&gt;&lt;/A&gt; since it provides more ML-specific packages, and it makes it easy to run &lt;A href="https://docs.databricks.com/spark/latest/sparkr/rstudio.html" alt="https://docs.databricks.com/spark/latest/sparkr/rstudio.html" target="_blank"&gt;&lt;U&gt;RStudio in Databricks&lt;/U&gt;&lt;/A&gt;.&amp;nbsp;To find which version of R each runtime uses, you can check the &lt;A href="https://docs.databricks.com/release-notes/runtime/releases.html" alt="https://docs.databricks.com/release-notes/runtime/releases.html" target="_blank"&gt;&lt;U&gt;runtime release notes&lt;/U&gt;&lt;/A&gt;.&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;What is MLflow autologging vs. Databricks autologging?&lt;UL&gt;&lt;LI&gt;MLflow provides autologging which automatically tracks ML training activity for certain libraries.&amp;nbsp;E.g., &lt;A href="https://www.mlflow.org/docs/latest/python_api/mlflow.sklearn.html" alt="https://www.mlflow.org/docs/latest/python_api/mlflow.sklearn.html" target="_blank"&gt;&lt;U&gt;mlflow.sklearn.autolog()&lt;/U&gt;&lt;/A&gt; triggers tracking for scikit-learn, picking up parameters, metrics, and models when you train models.&amp;nbsp;You can also call &lt;A href="https://www.mlflow.org/docs/latest/python_api/mlflow.html" alt="https://www.mlflow.org/docs/latest/python_api/mlflow.html" target="_blank"&gt;&lt;U&gt;mlflow.autolog()&lt;/U&gt;&lt;/A&gt; to turn on &lt;A href="https://www.mlflow.org/docs/latest/tracking.html#automatic-logging" alt="https://www.mlflow.org/docs/latest/tracking.html#automatic-logging" target="_blank"&gt;&lt;U&gt;all types of MLflow autologging&lt;/U&gt;&lt;/A&gt;.&amp;nbsp;"Databricks Autologging" turns on MLflow autologging by default, and it just entered Public Preview in many regions: &lt;A href="https://docs.databricks.com/applications/mlflow/databricks-autologging.html" alt="https://docs.databricks.com/applications/mlflow/databricks-autologging.html" target="_blank"&gt;&lt;U&gt;&lt;/U&gt;&lt;/A&gt;&lt;A href="https://docs.databricks.com/applications/mlflow/databricks-autologging.html" target="test_blank"&gt;https://docs.databricks.com/applications/mlflow/databricks-autologging.html&lt;/A&gt;&amp;nbsp;(That's for AWS, but there's an equivalent page for Azure and GCP.)&amp;nbsp;You can find more info in those docs.&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;What ML frameworks are supported by MLflow?&lt;UL&gt;&lt;LI&gt;MLflow has built-in support for many common frameworks, but it is also pluggable and can be used with any ML framework.&lt;/LI&gt;&lt;LI&gt;For autologging, the MLflow docs provide a &lt;A href="https://mlflow.org/docs/latest/tracking.html#automatic-logging" alt="https://mlflow.org/docs/latest/tracking.html#automatic-logging" target="_blank"&gt;&lt;U&gt;list of built-in integrations&lt;/U&gt;&lt;/A&gt;, as well as &lt;A href="https://www.mlflow.org/docs/latest/tracking.html#logging-data-to-runs" alt="https://www.mlflow.org/docs/latest/tracking.html#logging-data-to-runs" target="_blank"&gt;&lt;U&gt;info on custom logging&lt;/U&gt;&lt;/A&gt;.&lt;/LI&gt;&lt;LI&gt;For saving models (MLflow Models and “flavors”), the MLflow docs provide a &lt;A href="https://mlflow.org/docs/latest/models.html#built-in-model-flavors" alt="https://mlflow.org/docs/latest/models.html#built-in-model-flavors" target="_blank"&gt;&lt;U&gt;list of built-in integrations&lt;/U&gt;&lt;/A&gt;, as well as info on customization via &lt;A href="https://mlflow.org/docs/latest/models.html#python-function-python-function" alt="https://mlflow.org/docs/latest/models.html#python-function-python-function" target="_blank"&gt;&lt;U&gt;pyfunc models&lt;/U&gt;&lt;/A&gt; and &lt;A href="https://mlflow.org/docs/latest/models.html#custom-flavors" alt="https://mlflow.org/docs/latest/models.html#custom-flavors" target="_blank"&gt;&lt;U&gt;custom flavors&lt;/U&gt;&lt;/A&gt;.&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;How can I track which dataset was used to train each model in MLflow and Databricks?&lt;UL&gt;&lt;LI&gt;If you're using Databricks AutoML, it automatically logs the dataset to the MLflow Tracking Server.&lt;/LI&gt;&lt;LI&gt;If you’re writing custom ML code, then your best options are:&lt;UL&gt;&lt;LI&gt;For Spark data sources, especially Delta: If you use autologging and read from a Spark datasource, it will log that as a tag in the MLflow run.&amp;nbsp;If that's a Delta datasource, then it saves the table version number.&lt;/LI&gt;&lt;LI&gt;For non-Spark data sources (e.g., loading via pandas), you can always log a custom tag or param to save the dataset location, ID or version number.&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;Model Registry&lt;/B&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Is the MLflow registry restricted to a workspace? Or can multiple workspaces push to a centralized or common registry?&lt;UL&gt;&lt;LI&gt;​​You can set up a multi-workspace registry.&amp;nbsp;That's common for splitting into dev/test/prod workspaces, all of which share one registry.&amp;nbsp;Here's some more info on that: &lt;A href="https://docs.databricks.com/applications/machine-learning/manage-model-lifecycle/multiple-workspaces.html" alt="https://docs.databricks.com/applications/machine-learning/manage-model-lifecycle/multiple-workspaces.html" target="_blank"&gt;https://docs.databricks.com/applications/machine-learning/manage-model-lifecycle/multiple-workspaces.html&lt;/A&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Fri, 08 Oct 2021 16:05:02 GMT</pubDate>
    <dc:creator>Joseph_B</dc:creator>
    <dc:date>2021-10-08T16:05:02Z</dc:date>
    <item>
      <title>docs.databricks.com</title>
      <link>https://community.databricks.com/t5/machine-learning/docs-databricks-com/m-p/13842#M738</link>
      <description>&lt;P&gt;&lt;B&gt;&lt;U&gt;2021-09 webinar: Automating the ML Lifecycle With Databricks Machine Learning (post 1 of 2)&lt;/U&gt;&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thank you to everyone who joined the Automating the ML Lifecycle With Databricks Machine Learning webinar!&amp;nbsp;You can access the &lt;A href="https://databricks.com/p/webinar/automating-the-ml-lifecycle-with-databricks-machine-learning?itm_data=product-resources-automatingMLlifecycle" alt="https://databricks.com/p/webinar/automating-the-ml-lifecycle-with-databricks-machine-learning?itm_data=product-resources-automatingMLlifecycle" target="_blank"&gt;&lt;U&gt;on-demand recording here&lt;/U&gt;&lt;/A&gt; and the &lt;A href="https://github.com/RafiKurlansik/dais2021_full_ml_lifecycle" alt="https://github.com/RafiKurlansik/dais2021_full_ml_lifecycle" target="_blank"&gt;&lt;U&gt;code in this Github repo&lt;/U&gt;&lt;/A&gt;.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;We're sharing a subset of the questions asked and answered throughout the session, as well as the links to resources in the last slide of the webinar. Please feel free to ask follow-up questions or add comments as threads.&amp;nbsp;Due to length limits on Community posts, we’ll split this in two.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;Databricks ML&lt;/B&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;How can I enable the Databricks ML workspace?&lt;UL&gt;&lt;LI&gt;In your Databricks workspace, you should be able to see a selector in the upper-left.&amp;nbsp;There's a GIF of selecting it here: &lt;A href="https://docs.databricks.com/applications/machine-learning/index.html" alt="https://docs.databricks.com/applications/machine-learning/index.html" target="_blank"&gt;&lt;U&gt;&lt;/U&gt;&lt;/A&gt;&lt;A href="https://docs.databricks.com/applications/machine-learning/index.html" target="test_blank"&gt;https://docs.databricks.com/applications/machine-learning/index.html&lt;/A&gt;&amp;nbsp;You can also pin a particular “persona” or workspace view to be your default.&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;How can I get started with Databricks ML?&lt;UL&gt;&lt;LI&gt;If you want guided tutorials, then the Databricks Academy has great resources, especially its recommended learning path for Data Scientists: &lt;A href="https://academy.databricks.com/data-scientist" alt="https://academy.databricks.com/data-scientist" target="_blank"&gt;&lt;U&gt;&lt;/U&gt;&lt;/A&gt;&lt;A href="https://academy.databricks.com/data-scientist" target="test_blank"&gt;https://academy.databricks.com/data-scientist&lt;/A&gt;&amp;nbsp;These resources are free for customers; contact support if you have trouble accessing them.&lt;/LI&gt;&lt;LI&gt;If there is a particular task you want to do, start from our documentation &lt;A href="https://docs.databricks.com/applications/machine-learning/index.html" alt="https://docs.databricks.com/applications/machine-learning/index.html" target="_blank"&gt;&lt;U&gt;&lt;/U&gt;&lt;/A&gt;&lt;A href="https://docs.databricks.com/applications/machine-learning/index.html" target="test_blank"&gt;https://docs.databricks.com/applications/machine-learning/index.html&lt;/A&gt; to find the right page, and look for examples and code for that task.&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;AutoML&lt;/B&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;How does your AutoML compare with other enterprise AutoML approaches?&lt;UL&gt;&lt;LI&gt;I'd say the highest level bit is that Databricks AutoML takes a "glass-box" approach, generating notebooks for every model it fits.&amp;nbsp;That allows you to clone and modify the code to further iterate on the models.&amp;nbsp;In general, all AutoML solutions generate pretty good results---but not as good as models with more expert knowledge incorporated.&amp;nbsp;This code generation approach lets data scientists get a reasonable model quickly and then incorporate their domain expertise to improve the model further.&amp;nbsp;For a good intro to it, I'd recommend checking out the Data AI Summit 2021 keynote on Databricks ML: &lt;A href="https://youtu.be/zQEiwJqqeeA" alt="https://youtu.be/zQEiwJqqeeA" target="_blank"&gt;&lt;U&gt;&lt;/U&gt;&lt;/A&gt;&lt;A href="https://youtu.be/zQEiwJqqeeA" target="test_blank"&gt;https://youtu.be/zQEiwJqqeeA&lt;/A&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;General MLflow&lt;/B&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;What support do MLflow and Databricks have for R?&lt;UL&gt;&lt;LI&gt;MLflow has native support for R.&amp;nbsp;You can find the R API docs here: &lt;A href="https://mlflow.org/docs/latest/R-api.html" alt="https://mlflow.org/docs/latest/R-api.html" target="_blank"&gt;&lt;U&gt;&lt;/U&gt;&lt;/A&gt;&lt;A href="https://mlflow.org/docs/latest/R-api.html" target="test_blank"&gt;https://mlflow.org/docs/latest/R-api.html&lt;/A&gt;&lt;/LI&gt;&lt;LI&gt;If you're working within Databricks, then Databricks Runtimes provide R and many common packages out-of-the-box.&amp;nbsp;We generally recommend using the &lt;A href="https://docs.databricks.com/runtime/mlruntime.html" alt="https://docs.databricks.com/runtime/mlruntime.html" target="_blank"&gt;&lt;U&gt;Databricks Runtime for Machine Learning&lt;/U&gt;&lt;/A&gt; since it provides more ML-specific packages, and it makes it easy to run &lt;A href="https://docs.databricks.com/spark/latest/sparkr/rstudio.html" alt="https://docs.databricks.com/spark/latest/sparkr/rstudio.html" target="_blank"&gt;&lt;U&gt;RStudio in Databricks&lt;/U&gt;&lt;/A&gt;.&amp;nbsp;To find which version of R each runtime uses, you can check the &lt;A href="https://docs.databricks.com/release-notes/runtime/releases.html" alt="https://docs.databricks.com/release-notes/runtime/releases.html" target="_blank"&gt;&lt;U&gt;runtime release notes&lt;/U&gt;&lt;/A&gt;.&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;What is MLflow autologging vs. Databricks autologging?&lt;UL&gt;&lt;LI&gt;MLflow provides autologging which automatically tracks ML training activity for certain libraries.&amp;nbsp;E.g., &lt;A href="https://www.mlflow.org/docs/latest/python_api/mlflow.sklearn.html" alt="https://www.mlflow.org/docs/latest/python_api/mlflow.sklearn.html" target="_blank"&gt;&lt;U&gt;mlflow.sklearn.autolog()&lt;/U&gt;&lt;/A&gt; triggers tracking for scikit-learn, picking up parameters, metrics, and models when you train models.&amp;nbsp;You can also call &lt;A href="https://www.mlflow.org/docs/latest/python_api/mlflow.html" alt="https://www.mlflow.org/docs/latest/python_api/mlflow.html" target="_blank"&gt;&lt;U&gt;mlflow.autolog()&lt;/U&gt;&lt;/A&gt; to turn on &lt;A href="https://www.mlflow.org/docs/latest/tracking.html#automatic-logging" alt="https://www.mlflow.org/docs/latest/tracking.html#automatic-logging" target="_blank"&gt;&lt;U&gt;all types of MLflow autologging&lt;/U&gt;&lt;/A&gt;.&amp;nbsp;"Databricks Autologging" turns on MLflow autologging by default, and it just entered Public Preview in many regions: &lt;A href="https://docs.databricks.com/applications/mlflow/databricks-autologging.html" alt="https://docs.databricks.com/applications/mlflow/databricks-autologging.html" target="_blank"&gt;&lt;U&gt;&lt;/U&gt;&lt;/A&gt;&lt;A href="https://docs.databricks.com/applications/mlflow/databricks-autologging.html" target="test_blank"&gt;https://docs.databricks.com/applications/mlflow/databricks-autologging.html&lt;/A&gt;&amp;nbsp;(That's for AWS, but there's an equivalent page for Azure and GCP.)&amp;nbsp;You can find more info in those docs.&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;What ML frameworks are supported by MLflow?&lt;UL&gt;&lt;LI&gt;MLflow has built-in support for many common frameworks, but it is also pluggable and can be used with any ML framework.&lt;/LI&gt;&lt;LI&gt;For autologging, the MLflow docs provide a &lt;A href="https://mlflow.org/docs/latest/tracking.html#automatic-logging" alt="https://mlflow.org/docs/latest/tracking.html#automatic-logging" target="_blank"&gt;&lt;U&gt;list of built-in integrations&lt;/U&gt;&lt;/A&gt;, as well as &lt;A href="https://www.mlflow.org/docs/latest/tracking.html#logging-data-to-runs" alt="https://www.mlflow.org/docs/latest/tracking.html#logging-data-to-runs" target="_blank"&gt;&lt;U&gt;info on custom logging&lt;/U&gt;&lt;/A&gt;.&lt;/LI&gt;&lt;LI&gt;For saving models (MLflow Models and “flavors”), the MLflow docs provide a &lt;A href="https://mlflow.org/docs/latest/models.html#built-in-model-flavors" alt="https://mlflow.org/docs/latest/models.html#built-in-model-flavors" target="_blank"&gt;&lt;U&gt;list of built-in integrations&lt;/U&gt;&lt;/A&gt;, as well as info on customization via &lt;A href="https://mlflow.org/docs/latest/models.html#python-function-python-function" alt="https://mlflow.org/docs/latest/models.html#python-function-python-function" target="_blank"&gt;&lt;U&gt;pyfunc models&lt;/U&gt;&lt;/A&gt; and &lt;A href="https://mlflow.org/docs/latest/models.html#custom-flavors" alt="https://mlflow.org/docs/latest/models.html#custom-flavors" target="_blank"&gt;&lt;U&gt;custom flavors&lt;/U&gt;&lt;/A&gt;.&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;How can I track which dataset was used to train each model in MLflow and Databricks?&lt;UL&gt;&lt;LI&gt;If you're using Databricks AutoML, it automatically logs the dataset to the MLflow Tracking Server.&lt;/LI&gt;&lt;LI&gt;If you’re writing custom ML code, then your best options are:&lt;UL&gt;&lt;LI&gt;For Spark data sources, especially Delta: If you use autologging and read from a Spark datasource, it will log that as a tag in the MLflow run.&amp;nbsp;If that's a Delta datasource, then it saves the table version number.&lt;/LI&gt;&lt;LI&gt;For non-Spark data sources (e.g., loading via pandas), you can always log a custom tag or param to save the dataset location, ID or version number.&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;Model Registry&lt;/B&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Is the MLflow registry restricted to a workspace? Or can multiple workspaces push to a centralized or common registry?&lt;UL&gt;&lt;LI&gt;​​You can set up a multi-workspace registry.&amp;nbsp;That's common for splitting into dev/test/prod workspaces, all of which share one registry.&amp;nbsp;Here's some more info on that: &lt;A href="https://docs.databricks.com/applications/machine-learning/manage-model-lifecycle/multiple-workspaces.html" alt="https://docs.databricks.com/applications/machine-learning/manage-model-lifecycle/multiple-workspaces.html" target="_blank"&gt;https://docs.databricks.com/applications/machine-learning/manage-model-lifecycle/multiple-workspaces.html&lt;/A&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 08 Oct 2021 16:05:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/docs-databricks-com/m-p/13842#M738</guid>
      <dc:creator>Joseph_B</dc:creator>
      <dc:date>2021-10-08T16:05:02Z</dc:date>
    </item>
  </channel>
</rss>

