cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

mlflow project train and validate - Control over the data used in the script?

VirajV
New Contributor

Hi there,

Trying to decide if I am going to get started with ml and really enjoyed it so far.

When going through the documentation, there was a blocker moment for me, as I feel the documentation doesn't mention much about the dataset used to train the model.

Model = Data + (Algorithm & hyperparameters )

0693f000007OoS1AAK

I don't see an example in documentation where MLprojects is ran on different data (CSV ,SQL or code based etc..),

The code shown in the screenshot

"mlflow run sklearn_elasticnet_wine -P alpha = 0.5 would retrain a model with different hyperparameters, but on what data?

Has it already been included in the project, and can you change it to train the model on different data.

How do you store and track the datasets being used?

Can someone explain please?

Thanks,

1 REPLY 1

Kaniz
Community Manager
Community Manager

Hi @ VirajV! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the Forum have an answer to your question first. Or else I will follow up shortly with a response.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.