Databricks Community

pol7451 · ‎04-03-2023

Hey,

We got two models A and B

Model A is fed from raw data that is firstly Clean / enriched and forecasted

The results from model A are what are fed into model B

the processes for cleaning, enriching, forecasting, model A and model B are all under version control in git

We're looking for a way to automate model history and display to an end user why the result would look different if the same raw data was fed in today Vs the last time it was run

I don't think there is any in built functionality within Databricks to do this... at a high level I was thinking of doing something like the flow in the attached image suggests and was wondering if there was a better solution or is I've fallen into any clangers?

thanks for your help!

Anonymous · ‎04-04-2023

@polly halton

Here are some high-level steps you can follow:

Store model input data and output results: As part of your data processing pipeline, capture the input data (raw data) that is fed into Model A and the output results (forecasts) generated by Model A. You can store this data in a versioned storage system such as Delta Lake or a traditional version control system like Git.
Record model metadata: Along with the input data and output results, capture metadata about the models, such as the version of the models, the date/time when the models were run, and any relevant parameters or configurations used during the model run. This metadata can be stored alongside the input data and output results.
Build a model history tracking system: Create a system that can track the model history and store the metadata in a way that makes it easy to query and display to end users. This can be done using a combination of data processing tools in Databricks, such as Delta Lake, Spark SQL, and visualization libraries like Matplotlib or Plotly.
Create visualizations for model history: Use visualization libraries to create informative and interactive visualizations that display the model history to end users. For example, you can create line charts, bar charts, or heatmap charts that show how the model results have changed over time for different versions of the models or different runs of the same model.
Incorporate model history into end-user applications: Integrate the model history visualizations into your end-user applications or reporting dashboards, so that users can easily access and interpret the model history information.
Automate the process: Automate the capturing of model history metadata and the creation of visualizations as part of your data processing pipeline. This can be done using scheduling tools or workflow automation tools in Databricks, such as Databricks Workflow.

By following these steps, you can create a system that captures and displays model history to end users, allowing them to understand why the model results may differ when the same raw data is fed into the models at different points in time. Keep in mind that the specific implementation details may vary depending on your use case and requirements, and it's important to thoroughly test and validate the system to ensure accuracy and reliability of the model history information.

Anonymous · ‎04-04-2023

Hi @polly halton

Thank you for posting your question in our community! We are happy to assist you.

To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?

This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance!

Databricks Community

Automating model history with multiple downstream elements

Photos

Join Us as a Local Community Builder!

Announcing the APJ Databricks Smart Business Insights Challenge: Empowering Data-Driven Decision Mak

🚀 Monthly Databricks Get Started Days – Accelerate Your Learning Journey! 🚀

Business Intelligence in the Era of AI

Virtual Learning Festival: 9 April - 30 April

Data + AI Summit 2025 — registration now open!