@ryojikn and @irtizak , youโre right. Databricks Model Serving allows splitting traffic between model versions, but it doesnโt have a true shadow deployment where live production traffic is mirrored to a new model for monitoring without affecting user responses.
For now, you can try a couple of custom approaches:
1) Deploy one endpoint with your production model and another with the shadow model. On the client side, duplicate each incoming request to both endpoints, but return only the production modelโs response to the user. You can capture and compare both responses later using the inference table for analysis.
2) Wrap your models inside a PyFunc and handle routing within the wrapper itself. You can reference models dynamically using aliases (like champion and challenger) so that whenever a model version changes, you donโt need to update the wrapper code. Itโll automatically select the correct model version based on the alias when the endpoint is updated.