cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Automating model history with multiple downstream elements

pol7451
New Contributor

Hey,

We got two models A and B

Model A is fed from raw data that is firstly Clean / enriched and forecasted

The results from model A are what are fed into model B

the processes for cleaning, enriching, forecasting, model A and model B are all under version control in git

We're looking for a way to automate model history and display to an end user why the result would look different if the same raw data was fed in today Vs the last time it was run

I don't think there is any in built functionality within Databricks to do this... at a high level I was thinking of doing something like the flow in the attached image suggests and was wondering if there was a better solution or is I've fallen into any clangers?

thanks for your help!

2 REPLIES 2

Anonymous
Not applicable

@polly halton​ 

Here are some high-level steps you can follow:

  1. Store model input data and output results: As part of your data processing pipeline, capture the input data (raw data) that is fed into Model A and the output results (forecasts) generated by Model A. You can store this data in a versioned storage system such as Delta Lake or a traditional version control system like Git.
  2. Record model metadata: Along with the input data and output results, capture metadata about the models, such as the version of the models, the date/time when the models were run, and any relevant parameters or configurations used during the model run. This metadata can be stored alongside the input data and output results.
  3. Build a model history tracking system: Create a system that can track the model history and store the metadata in a way that makes it easy to query and display to end users. This can be done using a combination of data processing tools in Databricks, such as Delta Lake, Spark SQL, and visualization libraries like Matplotlib or Plotly.
  4. Create visualizations for model history: Use visualization libraries to create informative and interactive visualizations that display the model history to end users. For example, you can create line charts, bar charts, or heatmap charts that show how the model results have changed over time for different versions of the models or different runs of the same model.
  5. Incorporate model history into end-user applications: Integrate the model history visualizations into your end-user applications or reporting dashboards, so that users can easily access and interpret the model history information.
  6. Automate the process: Automate the capturing of model history metadata and the creation of visualizations as part of your data processing pipeline. This can be done using scheduling tools or workflow automation tools in Databricks, such as Databricks Workflow.

By following these steps, you can create a system that captures and displays model history to end users, allowing them to understand why the model results may differ when the same raw data is fed into the models at different points in time. Keep in mind that the specific implementation details may vary depending on your use case and requirements, and it's important to thoroughly test and validate the system to ensure accuracy and reliability of the model history information.

Anonymous
Not applicable

Hi @polly halton​ 

Thank you for posting your question in our community! We are happy to assist you.

To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?

This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance! 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group