Databricks Community

chrisf_sts · ‎04-14-2024

I have a naive Bayes ML model that takes call attributes and predicts if the caller is going to abandon the call while they are on hold waiting to speak to an agent. The model lives in Databricks ML flow, I have it registered.

What I need to do is extract the exact calculations so I can make them myself during the call. Once the user hangs up, the prediction is useless. I want to predict whether the caller breaks the threshold of "likely to abandon" based on a running tally of the features and weights during the call.

Is there a way to extract the calculations being made in the model and make them myself? Because of the way our software is set up, the business owners do not want to import the whole model and make predictions every time a feature is updated. It would eat significantly less resources to just have a running tally and once that tally breaks a threshold, flag the call.

Asking AI, it seems like the calculations are obscured and it's not easy to extract them and make them myself, especially if using a naive bayes model.

Kaniz_Fatma · ‎05-06-2024

Hi @chrisf_sts,

Naive Bayes Model Overview: Naive Bayes is a probabilistic machine learning algorithm based on Bayes’ Theorem. It’s commonly used for classification tasks, such as spam filtering, document classification, and sentiment prediction. The “naive” part comes from the assumption that features are independent of each other, which simpli...¹.
Predictions with Naive Bayes: Given a trained Naive Bayes model, you can make predictions for new data using Bayes’ theorem. The Maximum A Posteriori (MAP) estimate is used to predict the most likely class. For a new instance, the prediction is based on maximizing the product of the conditional probabilities of features given the class and the prior probability of the class: [ \text{MAP}(h) = \max(P(d|h) \cdot P(h)) ]
Extracting Calculations: Unfortunately, directly extracting the exact calculations from a Naive Bayes model can be challenging. The model’s parameters (such as probabilities and weights) are typically learned during training and are not directly interpretable. However, I’ll outline some potential strategies:
- Manual Calculation (Not Recommended): You could theoretically compute the probabilities and weights manually by analyzing the training data and applying Bayes’ theorem. However, this approach is error-prone, time-consuming, and not practical for real-world applications.
- Feature Importance: Instead of extracting exact calculations, consider identifying the most important features for your prediction. Some libraries provide feature importance scores, which can guide you in understanding which features contribute significantly to the model’s decisions.
- Threshold-Based Approach: Since you’re interested in a running tally during the call, you might set up a threshold-based system. Accumulate feature values during the call, and if the tally exceeds a certain threshold, flag the call as “likely to abandon.” This approach doesn’t require extracting model internals but relies on feature accumulation.
- Gaussian Naive Bayes: If your features are continuous (e.g., call duration), consider using Gaussian Naive Bayes. It assumes that features follow a Gaussian distribution, which might be more suitable for continuous...².
Practical Recommendations:
- Monitor Features During Calls: Continuously track relevant features during the call (e.g., call duration, wait time, customer behavior). Update the tally based on these features.
- Set Thresholds Carefully: Experiment with different thresholds to find the right balance between false positives (flagging calls unnecessarily) and false negatives (missing actual abandonments).
- Evaluate and Refine: Regularly evaluate the performance of your threshold-based system using historical data. Adjust thresholds or features as needed.
Remember that Naive Bayes is just one approach, and there are other models (e.g., logistic regression, decision trees) that might be more interpretable or better suited to your specific problem. Consider discussing this with your team to find the best solution for your use case. 😊¹ ².

If you have any further questions or need additional assistance, feel free to ask!

Databricks Community

Extract calculations naive bayes model

Connect with Databricks Users in Your Area

Submit your feedback and win a $25 gift card!

Databricks Unity Catalog Workshop

Join us at the Databricks Generative AI World Cup (Virtual Hackathon)

Upskill on Databricks in just an hour

Supernovas, Black Holes and Streaming Data