Databricks Community

Joseph_B · 01-08-2022

I'm fitting multiple models in parallel. For each one, I'm logging lots of params and metrics to MLflow. I'm hitting rate limits, causing problems in my jobs.

Joseph_B · 12-20-2021

My dataset has an "item" column which groups the rows into many groups. (Think of these groups as items in a store.) I want to fit 1 ML model per group. Should I tune hyperparameters for each group separately? Or should I tune them for the entire...

Joseph_B · 12-20-2021

When should I use Spark ML's CrossValidator or TrainValidationSplit, vs. a separate tuning tool such as Hyperopt?

Joseph_B · 10-08-2021

2021-09 webinar: Automating the ML Lifecycle With Databricks Machine Learning (Post 2 of 2)Thank you to everyone who joined! You can access the on-demand recording here and the code in this Github repo.We're sharing a subset of the questions asked an...

Joseph_B · 10-08-2021

2021-09 webinar: Automating the ML Lifecycle With Databricks Machine Learning (post 1 of 2)Thank you to everyone who joined the Automating the ML Lifecycle With Databricks Machine Learning webinar! You can access the on-demand recording here and the ...

Joseph_B · 06-02-2023

I believe it's still the best option. That said, it would be good to know what the OData API is needed for. When I added the original answer, Databricks SQL was nowhere near where it is today, and it's now easy to connect DB SQL directly to PowerBI...

Joseph_B · 01-08-2022

The first thing to try is to log in batches. If you are logging each param and metric separately, you're making 1 API call per param and 1 per metric. Instead, you should use the batch logging APIs; e.g. use "log_params" instead of "log_param" http...

Joseph_B · 12-20-2021

For the first question ("which option is better?"), you need to answer that via your understanding of the problem domain.Do you expect similar behavior across the groups (items)?If so, that's a +1 in favor of sharing hyperparameters. And vice versa....

Joseph_B · 12-20-2021

Both are valid choices. By default, I'd recommend using Hyperopt nowadays. Here's the rationale, as pros & cons of each.Spark ML's built-in toolsPros: These fit the Spark ML Pipeline framework, so you can keep using the same type of APIs.Cons: Thes...

Joseph_B · 09-15-2021

The MLflow run was probably created either (a) via notebook autologging or (b) via a call to `mlflow.start_run()`. With (a), when the notebook first logs something to MLflow, it starts a run. But if the notebook is still active and attached to a clu...

Databricks Community

User Stats

User Activity

What can I do to reduce the number of MLflow API calls I make?

How should I tune hyperparameters when fitting models for every item?

For tuning hyperparameters with Apache Spark ML / MLlib, when should I use Spark ML's built-in tuning algorithms vs. Hyperopt?

mlflow.org

docs.databricks.com

Re: Connect Delta Lake to OData API?

Re: What can I do to reduce the number of MLflow API calls I make?

Re: How should I tune hyperparameters when fitting models for every item?

Re: For tuning hyperparameters with Apache Spark ML / MLlib, when should I use Spark ML's built-in tuning algorithms vs. Hyperopt?

Re: What does it mean if an MLflow run is "UNFINISHED?"