Should elaborate and complex LLM apps be deployed as MLFlow serving endpoints?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-12-2025 07:25 AM - edited 02-12-2025 07:51 AM
In a project we are building increasingly complex LLM-based Apps (RAG, multi-agent workflows, langgraph, unstructured ingestion etc), and we are having doubts if these apps should be deployed as MLFlow-based endpoints. I would like your feedback on input on the following questions:
1. Are Serving endpoints made to work for bigger and more complex apps, like multi-agent apps with recursive work flows?
2. Are there memory, dependency (Pypi, binary) or other limitations to serving endpoint deployment which would warrant using a serverless container service like Fargate instead?
3. Is it the purpose of serving endpoints to be used for deploying more complex and "bigger" python LLM apps?
4. How do you deal with your frontend or backend authenticating against the model? We have used databricks service principals, but these get access to files on the workspace root folder, and we would really like to limit access to only the service endpoint.
- Labels:
-
GenAI
-
Generation AI
-
MlFlow
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-12-2025 12:18 PM
Hi Pal,
I am experimenting with the same points you mentioned.
1, 3 - I have a 100% Azure Databricks RAG solution that processes new files coming from a frontend. Still trying to assess the costs and infrastructure as it scales.
4 - I have been using Databricks API a lot with SP with PAT. Use it to upload files to be processed, so write permissions on volume, and to query the model - can query permissions on the serving endpoint. How is your SP getting access to files on the workspace root folder?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-22-2025 04:35 AM
Seems like agent serving, in public preview, might cover what I am asking about.
One of my heuristics in cloud engineering is to expect that a pressing general need will soon be solved by the cloud (or data platform) provider soon. Often wise to not spend too much time on such a problem, as a solution often will be provided before you are done implemeting a solution yourself.
Docs here:
docs.databricks.com/aws/en/generative-ai/agent-framework/deploy-agent
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-13-2025 03:30 AM - edited 02-13-2025 03:59 AM
Actually, maybe root folder was imprecise. The point is that it gets file system access. It becomes a regular Workspace user, with too much access. If, however, you want to give it specific accesses beyond that, you could give it access to specific volumes with GRANT statements or pressing the Permissions button.

