cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Model from code approach

ddpotapov
New Contributor II

Hi Databricks Team,

I am trying to understand the "model from code" approach. I am reading your Big Book of MLOps.

Is it correct that when using this approach I need to train the model twice - in development and in production?
I am asking because in this case the process of training the model can be very expensive if we train the model twice (train in development and in production).

Thank you.

2 REPLIES 2

Alberto_Umana
Databricks Employee
Databricks Employee

Hello @ddpotapov,

It is not necessarily required to train the model twice (in development and in production). There are two common patterns for moving ML artifacts through staging and into production:

 

  1. Deploy Code Approach:
    • In this pattern, the code to train models is developed in the development environment. The same code moves to staging and then production.
    • The model is trained in each environment: initially in the development environment as part of model development, in staging (on a limited subset of data) as part of integration tests, and in the production environment (on the full production data) to produce the final model.
    • This approach allows the model to be trained on production data in the production environment, which can be beneficial if access to production data is restricted.
  2. Deploy Models Approach:
    • In this pattern, the model artifact is generated by training code in the development environment. The artifact is then tested in the staging environment before being deployed into production.
    • This approach is suitable when model training is very expensive or hard to reproduce, and it only requires training the model once in the development environment.

The choice between these patterns depends on factors such as the cost of model training, access to production data, and the complexity of the deployment process.

ddpotapov
New Contributor II

Thank you for your answer. 
You said:
initially in the development environment as part of model development

What does this mean?

Usually, I take a model, run a lot of training experiments with different hyperparameters. And when I find the best parameters, I train the model one last time to get the best final model. During these experiments, I use all the data I have for training.

In this case, it means that I will have a final trained model in the development environment, and after staging, I have to train the model again in the prod environment with the same data according to the Deploy Code Approach.

Can you clarify this point? Maybe I don't understand something. Thank you.


Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group