Llama 3.3 normally offers faster inference speeds compared to earlier versions. It provides approximately 40% faster responses and reduced batch processing time
However, the usual performance for Mosaic AI Model Serving are also influenced by configurations such as throughput bands, the setup for real-time or batch inference, and token usage.
While your usage with Unity Catalog functions and custom SQL prompts adds a layer of interaction to the model performance, it's important to check the model serving conditions. If the model hasn't been fine-tuned for the specific use case or if throughput isn't optimized (e.g., low-band provisioned throughput), latency might be increased