In a project we are building increasingly complex LLM-based Apps (RAG, multi-agent workflows, langgraph, unstructured ingestion etc), and we are having doubts if these apps should be deployed as MLFlow-based endpoints. I would like your feedback on input on the following questions:
1. Are Serving endpoints made to work for bigger and more complex apps, like multi-agent apps with recursive work flows?
2. Are there memory, dependency (Pypi, binary) or other limitations to serving endpoint deployment which would warrant using a serverless container service like Fargate instead?
3. Is it the purpose of serving endpoints to be used for deploying more complex and "bigger" python LLM apps?
4. How do you deal with your frontend or backend authenticating against the model? We have used databricks service principals, but these get access to files on the workspace root folder, and we would really like to limit access to only the service endpoint.