cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

mlflow project train and validate - Control over the data used in the script?

VirajV
New Contributor

Hi there,

Trying to decide if I am going to get started with ml and really enjoyed it so far.

When going through the documentation, there was a blocker moment for me, as I feel the documentation doesn't mention much about the dataset used to train the model.

Model = Data + (Algorithm & hyperparameters )

0693f000007OoS1AAK

I don't see an example in documentation where MLprojects is ran on different data (CSV ,SQL or code based etc..),

The code shown in the screenshot

"mlflow run sklearn_elasticnet_wine -P alpha = 0.5 would retrain a model with different hyperparameters, but on what data?

Has it already been included in the project, and can you change it to train the model on different data.

How do you store and track the datasets being used?

Can someone explain please?

Thanks,

0 REPLIES 0

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group