mlflow project train and validate - Control over t... - Databricks Community - 17489

Register to join the community

Data Engineering

Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.

Hi there,

Trying to decide if I am going to get started with ml and really enjoyed it so far.

When going through the documentation, there was a blocker moment for me, as I feel the documentation doesn't mention much about the dataset used to train the model.

Model = Data + (Algorithm & hyperparameters )

I don't see an example in documentation where MLprojects is ran on different data (CSV ,SQL or code based etc..),

The code shown in the screenshot

"mlflow run sklearn_elasticnet_wine -P alpha = 0.5 would retrain a model with different hyperparameters, but on what data?

Has it already been included in the project, and can you change it to train the model on different data.

How do you store and track the datasets being used?

Can someone explain please?

Thanks,

74 KB

0 REPLIES 0

never-displayed

You must be signed in to add attachments

never-displayed

Announcements

Intelligent Data Warehousing: AI/BI for Self-service Analytics

Introducing SAP Databricks

Serverless Compute for Notebooks, Workflows and Pipelines is now Generally Available on Google Cloud

Welcoming BladeBridge to Databricks: Accelerating Data Warehouse Migrations to Lakehouse

Databricks Clean Rooms: Now Generally Available on AWS and Azure