cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

docs.databricks.com

Joseph_B
Databricks Employee
Databricks Employee

2021-09 webinar: Automating the ML Lifecycle With Databricks Machine Learning (post 1 of 2)

Thank you to everyone who joined the Automating the ML Lifecycle With Databricks Machine Learning webinar! You can access the on-demand recording here and the code in this Github repo.

We're sharing a subset of the questions asked and answered throughout the session, as well as the links to resources in the last slide of the webinar. Please feel free to ask follow-up questions or add comments as threads. Due to length limits on Community posts, we’ll split this in two.

Databricks ML

AutoML

  • How does your AutoML compare with other enterprise AutoML approaches?
    • I'd say the highest level bit is that Databricks AutoML takes a "glass-box" approach, generating notebooks for every model it fits. That allows you to clone and modify the code to further iterate on the models. In general, all AutoML solutions generate pretty good results---but not as good as models with more expert knowledge incorporated. This code generation approach lets data scientists get a reasonable model quickly and then incorporate their domain expertise to improve the model further. For a good intro to it, I'd recommend checking out the Data AI Summit 2021 keynote on Databricks ML: https://youtu.be/zQEiwJqqeeA

General MLflow

  • What support do MLflow and Databricks have for R?
  • What is MLflow autologging vs. Databricks autologging?
  • What ML frameworks are supported by MLflow?
  • How can I track which dataset was used to train each model in MLflow and Databricks?
    • If you're using Databricks AutoML, it automatically logs the dataset to the MLflow Tracking Server.
    • If you’re writing custom ML code, then your best options are:
      • For Spark data sources, especially Delta: If you use autologging and read from a Spark datasource, it will log that as a tag in the MLflow run. If that's a Delta datasource, then it saves the table version number.
      • For non-Spark data sources (e.g., loading via pandas), you can always log a custom tag or param to save the dataset location, ID or version number.

Model Registry

0 REPLIES 0

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group