Similar response to the one from WorksBuddy.
Short answer: yes, it's supported, and there's a specific Databricks guide for your case.
The tutorial you found is for local/IDE; the production-container equivalent is here: Trace agents deployed outside of Databricks (https://docs.databricks.com/aws/en/mlflow3/genai/tracing/prod-tracing-external).
Same wiring (four env vars + the mlflow-tracing SDK).
1. Feasibility — Yes. Install mlflow-tracing, set DATABRICKS_HOST, DATABRICKS_TOKEN, MLFLOW_TRACKING_URI=databricks (literal string), MLFLOW_EXPERIMENT_NAME. Traces ship over HTTPS, async logging is on by default so it's off your request path. Your container, Databricks components, and Azure AI Foundry can be stitched into one trace via W3C TraceContext header propagation → Distributed Tracing (https://mlflow.org/docs/latest/genai/tracing/app-instrumentation/distributed-tracing).
2. Recommended for prod — with these upgrades over the dev tutorial:
- OAuth M2M service principal instead of PAT → docs (https://docs.databricks.com/aws/en/dev-tools/auth/oauth-m2m)
- mlflow-tracing package (~5 MB) instead of full mlflow (~1 GB); don't install both → docs (https://mlflow.org/docs/latest/genai/tracing/lightweight-sdk)
- Unity Catalog–backed experiment to escape the 100K-trace cap and 1,000-trace search ceiling
- Tune MLFLOW_TRACE_SAMPLING_RATIO, async worker/queue sizes, and MLFLOW_TRACE_TIMEOUT_SECONDS → Production Tracing (https://mlflow.org/docs/latest/genai/tracing/prod-tracing)
Limits to plan around (per workspace): 200 QPS trace creation, 25 QPS search, UC ingestion 200 traces/sec & 100 MB/sec per table → Tracing FAQ (https://docs.databricks.com/aws/en/mlflow3/genai/tracing/faq).
Gotchas: async = fire-and-forget, so flush on graceful shutdown or you lose queued traces on container kill. Keep voice audio/full transcripts as artifacts/URIs — don't inline them in trace payloads (latency overhead will climb to 100ms + above 1 MB). Production Monitoring Delta sync runs ~every 15 min, so use online judges for realtime alerting.
3. Alternatives:
- Stay on Lakebase custom traces — fine, but you rebuild dashboards/search/judges yourself
- Recommended: MLflow → Databricks (above)
- OpenTelemetry collector → multi-sink — MLflow spans are OTel-compatible, good if you also want Datadog/Grafana or vendor-neutrality
- Self-hosted OSS MLflow + Postgres — only if data-residency forces it; you operate everything and lose UC/judges
Net recommendation: the external-deployment MLflow pattern with mlflow-tracing SDK + OAuth M2M + a UC-backed experiment. It's the documented path, gets you centralized observability in Databricks, and unlocks the LLM-as-judge eval layer (hallucination, PII, response-relevance) that a voice agent will want next.