05-06-2026 09:33 AM
Hi everyone,
we're building a voice chatbot for a customer using a mix of technologies — Databricks, Azure AI Foundry, and a few external containerized services.
Currently, we're tracking requests and logs via Lakebase with custom traces, but I'm now evaluating whether it makes sense to shift to MLflow (Databricks-managed) for tracing instead.
I came across this tutorial on connecting an external environment to MLflow: 👉 Connect your dev environment to MLflow – Databricks Docs
The guide focuses on local/dev setups, but our use case is different:
My questions:
Thanks in advance 🙏
Alessandro
3 weeks ago
Similar response to the one from WorksBuddy.
Short answer: yes, it's supported, and there's a specific Databricks guide for your case.
The tutorial you found is for local/IDE; the production-container equivalent is here: Trace agents deployed outside of Databricks (https://docs.databricks.com/aws/en/mlflow3/genai/tracing/prod-tracing-external).
Same wiring (four env vars + the mlflow-tracing SDK).
1. Feasibility — Yes. Install mlflow-tracing, set DATABRICKS_HOST, DATABRICKS_TOKEN, MLFLOW_TRACKING_URI=databricks (literal string), MLFLOW_EXPERIMENT_NAME. Traces ship over HTTPS, async logging is on by default so it's off your request path. Your container, Databricks components, and Azure AI Foundry can be stitched into one trace via W3C TraceContext header propagation → Distributed Tracing (https://mlflow.org/docs/latest/genai/tracing/app-instrumentation/distributed-tracing).
2. Recommended for prod — with these upgrades over the dev tutorial:
- OAuth M2M service principal instead of PAT → docs (https://docs.databricks.com/aws/en/dev-tools/auth/oauth-m2m)
- mlflow-tracing package (~5 MB) instead of full mlflow (~1 GB); don't install both → docs (https://mlflow.org/docs/latest/genai/tracing/lightweight-sdk)
- Unity Catalog–backed experiment to escape the 100K-trace cap and 1,000-trace search ceiling
- Tune MLFLOW_TRACE_SAMPLING_RATIO, async worker/queue sizes, and MLFLOW_TRACE_TIMEOUT_SECONDS → Production Tracing (https://mlflow.org/docs/latest/genai/tracing/prod-tracing)
Limits to plan around (per workspace): 200 QPS trace creation, 25 QPS search, UC ingestion 200 traces/sec & 100 MB/sec per table → Tracing FAQ (https://docs.databricks.com/aws/en/mlflow3/genai/tracing/faq).
Gotchas: async = fire-and-forget, so flush on graceful shutdown or you lose queued traces on container kill. Keep voice audio/full transcripts as artifacts/URIs — don't inline them in trace payloads (latency overhead will climb to 100ms + above 1 MB). Production Monitoring Delta sync runs ~every 15 min, so use online judges for realtime alerting.
3. Alternatives:
- Stay on Lakebase custom traces — fine, but you rebuild dashboards/search/judges yourself
- Recommended: MLflow → Databricks (above)
- OpenTelemetry collector → multi-sink — MLflow spans are OTel-compatible, good if you also want Datadog/Grafana or vendor-neutrality
- Self-hosted OSS MLflow + Postgres — only if data-residency forces it; you operate everything and lose UC/judges
Net recommendation: the external-deployment MLflow pattern with mlflow-tracing SDK + OAuth M2M + a UC-backed experiment. It's the documented path, gets you centralized observability in Databricks, and unlocks the LLM-as-judge eval layer (hallucination, PII, response-relevance) that a voice agent will want next.
3 weeks ago
Hey Alessandro,
1. Technically feasible? Yes.
MLflow's tracking client is just HTTP. As long as your container can reach the Databricks-managed MLflow endpoint, you can send traces from anywhere. Set your workspace host and token as environment variables inside the container, point the tracking URI to Databricks, and you're connected.
2. Production gotchas:
The biggest one for a voice chatbot: don't let MLflow calls block your response path. Send traces asynchronously so observability never adds to user-facing latency.
Use a service principal over a PAT for auth. PATs expire, and manual rotation will catch you off guard in production. Also sort your network config early. If your ACI and Databricks workspace are in different VNets, private endpoints or IP allow-listing needs to be in place before anything else.
3. Alternative worth considering:
If latency is still a concern after going async, an OpenTelemetry sidecar collector in the same container gives you better control over batching and buffering before traces hit the network. You can export via OTLP to Databricks without the direct MLflow overhead.
The hybrid architecture you're describing is a common pattern now. MLflow as the centralized observability layer is a reasonable call. Just keep the tracing async and auth service-principal-based before you go live.
Good luck with the build.
3 weeks ago
Similar response to the one from WorksBuddy.
Short answer: yes, it's supported, and there's a specific Databricks guide for your case.
The tutorial you found is for local/IDE; the production-container equivalent is here: Trace agents deployed outside of Databricks (https://docs.databricks.com/aws/en/mlflow3/genai/tracing/prod-tracing-external).
Same wiring (four env vars + the mlflow-tracing SDK).
1. Feasibility — Yes. Install mlflow-tracing, set DATABRICKS_HOST, DATABRICKS_TOKEN, MLFLOW_TRACKING_URI=databricks (literal string), MLFLOW_EXPERIMENT_NAME. Traces ship over HTTPS, async logging is on by default so it's off your request path. Your container, Databricks components, and Azure AI Foundry can be stitched into one trace via W3C TraceContext header propagation → Distributed Tracing (https://mlflow.org/docs/latest/genai/tracing/app-instrumentation/distributed-tracing).
2. Recommended for prod — with these upgrades over the dev tutorial:
- OAuth M2M service principal instead of PAT → docs (https://docs.databricks.com/aws/en/dev-tools/auth/oauth-m2m)
- mlflow-tracing package (~5 MB) instead of full mlflow (~1 GB); don't install both → docs (https://mlflow.org/docs/latest/genai/tracing/lightweight-sdk)
- Unity Catalog–backed experiment to escape the 100K-trace cap and 1,000-trace search ceiling
- Tune MLFLOW_TRACE_SAMPLING_RATIO, async worker/queue sizes, and MLFLOW_TRACE_TIMEOUT_SECONDS → Production Tracing (https://mlflow.org/docs/latest/genai/tracing/prod-tracing)
Limits to plan around (per workspace): 200 QPS trace creation, 25 QPS search, UC ingestion 200 traces/sec & 100 MB/sec per table → Tracing FAQ (https://docs.databricks.com/aws/en/mlflow3/genai/tracing/faq).
Gotchas: async = fire-and-forget, so flush on graceful shutdown or you lose queued traces on container kill. Keep voice audio/full transcripts as artifacts/URIs — don't inline them in trace payloads (latency overhead will climb to 100ms + above 1 MB). Production Monitoring Delta sync runs ~every 15 min, so use online judges for realtime alerting.
3. Alternatives:
- Stay on Lakebase custom traces — fine, but you rebuild dashboards/search/judges yourself
- Recommended: MLflow → Databricks (above)
- OpenTelemetry collector → multi-sink — MLflow spans are OTel-compatible, good if you also want Datadog/Grafana or vendor-neutrality
- Self-hosted OSS MLflow + Postgres — only if data-residency forces it; you operate everything and lose UC/judges
Net recommendation: the external-deployment MLflow pattern with mlflow-tracing SDK + OAuth M2M + a UC-backed experiment. It's the documented path, gets you centralized observability in Databricks, and unlocks the LLM-as-judge eval layer (hallucination, PII, response-relevance) that a voice agent will want next.