Thanks for the clarification and links.
Bottom line, what I am trying to avoid is spinning up AWS resources in the background that will incur ongoing charges until I track them down and terminate them (I am a new Databricks customer and still trying to navigate the billing/cost system). The "Serve this model" UI looked suspiciously like it was going to do this but on second look, maybe not.
I am just wanting to confirm my only costs, Databricks or AWS, for Whisper and Llama in this Technical Blog post will be only for the short duration they will be used. Thanks!