Databricks Community

Ale_Armillotta · ‎05-06-2026

Hi everyone,

we're building a voice chatbot for a customer using a mix of technologies — Databricks, Azure AI Foundry, and a few external containerized services.

Currently, we're tracking requests and logs via Lakebase with custom traces, but I'm now evaluating whether it makes sense to shift to MLflow (Databricks-managed) for tracing instead.

I came across this tutorial on connecting an external environment to MLflow: 👉 Connect your dev environment to MLflow – Databricks Docs

The guide focuses on local/dev setups, but our use case is different:

The chatbot runs in an external container (not inside Databricks)
We need to track GenAI traces (inputs, outputs, latency, etc.) in production
We want a centralized observability layer directly in Databricks

My questions:

Is it technically feasible to send MLflow traces from an external production container to a Databricks-managed MLflow instance?
Is this approach recommended for production, or are there known limitations/gotchas?
Any alternative patterns you'd suggest for GenAI observability in this kind of hybrid architecture?

Thanks in advance 🙏

Alessandro

MoJaMa · 3 weeks ago

Similar response to the one from WorksBuddy.

Short answer: yes, it's supported, and there's a specific Databricks guide for your case.

The tutorial you found is for local/IDE; the production-container equivalent is here: Trace agents deployed outside of Databricks (https://docs.databricks.com/aws/en/mlflow3/genai/tracing/prod-tracing-external).
Same wiring (four env vars + the mlflow-tracing SDK).

1. Feasibility — Yes. Install mlflow-tracing, set DATABRICKS_HOST, DATABRICKS_TOKEN, MLFLOW_TRACKING_URI=databricks (literal string), MLFLOW_EXPERIMENT_NAME. Traces ship over HTTPS, async logging is on by default so it's off your request path. Your container, Databricks components, and Azure AI Foundry can be stitched into one trace via W3C TraceContext header propagation → Distributed Tracing (https://mlflow.org/docs/latest/genai/tracing/app-instrumentation/distributed-tracing).

2. Recommended for prod — with these upgrades over the dev tutorial:

- OAuth M2M service principal instead of PAT → docs (https://docs.databricks.com/aws/en/dev-tools/auth/oauth-m2m)

- mlflow-tracing package (~5 MB) instead of full mlflow (~1 GB); don't install both → docs (https://mlflow.org/docs/latest/genai/tracing/lightweight-sdk)

- Unity Catalog–backed experiment to escape the 100K-trace cap and 1,000-trace search ceiling

- Tune MLFLOW_TRACE_SAMPLING_RATIO, async worker/queue sizes, and MLFLOW_TRACE_TIMEOUT_SECONDS → Production Tracing (https://mlflow.org/docs/latest/genai/tracing/prod-tracing)

Limits to plan around (per workspace): 200 QPS trace creation, 25 QPS search, UC ingestion 200 traces/sec & 100 MB/sec per table → Tracing FAQ (https://docs.databricks.com/aws/en/mlflow3/genai/tracing/faq).

Gotchas: async = fire-and-forget, so flush on graceful shutdown or you lose queued traces on container kill. Keep voice audio/full transcripts as artifacts/URIs — don't inline them in trace payloads (latency overhead will climb to 100ms + above 1 MB). Production Monitoring Delta sync runs ~every 15 min, so use online judges for realtime alerting.

3. Alternatives:

- Stay on Lakebase custom traces — fine, but you rebuild dashboards/search/judges yourself

- Recommended: MLflow → Databricks (above)

- OpenTelemetry collector → multi-sink — MLflow spans are OTel-compatible, good if you also want Datadog/Grafana or vendor-neutrality

- Self-hosted OSS MLflow + Postgres — only if data-residency forces it; you operate everything and lose UC/judges

Net recommendation: the external-deployment MLflow pattern with mlflow-tracing SDK + OAuth M2M + a UC-backed experiment. It's the documented path, gets you centralized observability in Databricks, and unlocks the LLM-as-judge eval layer (hallucination, PII, response-relevance) that a voice agent will want next.

View solution in original post

WorksBuddy · 3 weeks ago

Hey Alessandro,

1. Technically feasible? Yes.

MLflow's tracking client is just HTTP. As long as your container can reach the Databricks-managed MLflow endpoint, you can send traces from anywhere. Set your workspace host and token as environment variables inside the container, point the tracking URI to Databricks, and you're connected.

2. Production gotchas:

The biggest one for a voice chatbot: don't let MLflow calls block your response path. Send traces asynchronously so observability never adds to user-facing latency.

Use a service principal over a PAT for auth. PATs expire, and manual rotation will catch you off guard in production. Also sort your network config early. If your ACI and Databricks workspace are in different VNets, private endpoints or IP allow-listing needs to be in place before anything else.

3. Alternative worth considering:

If latency is still a concern after going async, an OpenTelemetry sidecar collector in the same container gives you better control over batching and buffering before traces hit the network. You can export via OTLP to Databricks without the direct MLflow overhead.

The hybrid architecture you're describing is a common pattern now. MLflow as the centralized observability layer is a reasonable call. Just keep the tracing async and auth service-principal-based before you go live.

Good luck with the build.

MoJaMa · 3 weeks ago

Similar response to the one from WorksBuddy.

Short answer: yes, it's supported, and there's a specific Databricks guide for your case.

The tutorial you found is for local/IDE; the production-container equivalent is here: Trace agents deployed outside of Databricks (https://docs.databricks.com/aws/en/mlflow3/genai/tracing/prod-tracing-external).
Same wiring (four env vars + the mlflow-tracing SDK).

1. Feasibility — Yes. Install mlflow-tracing, set DATABRICKS_HOST, DATABRICKS_TOKEN, MLFLOW_TRACKING_URI=databricks (literal string), MLFLOW_EXPERIMENT_NAME. Traces ship over HTTPS, async logging is on by default so it's off your request path. Your container, Databricks components, and Azure AI Foundry can be stitched into one trace via W3C TraceContext header propagation → Distributed Tracing (https://mlflow.org/docs/latest/genai/tracing/app-instrumentation/distributed-tracing).

2. Recommended for prod — with these upgrades over the dev tutorial:

- OAuth M2M service principal instead of PAT → docs (https://docs.databricks.com/aws/en/dev-tools/auth/oauth-m2m)

- mlflow-tracing package (~5 MB) instead of full mlflow (~1 GB); don't install both → docs (https://mlflow.org/docs/latest/genai/tracing/lightweight-sdk)

- Unity Catalog–backed experiment to escape the 100K-trace cap and 1,000-trace search ceiling

- Tune MLFLOW_TRACE_SAMPLING_RATIO, async worker/queue sizes, and MLFLOW_TRACE_TIMEOUT_SECONDS → Production Tracing (https://mlflow.org/docs/latest/genai/tracing/prod-tracing)

Limits to plan around (per workspace): 200 QPS trace creation, 25 QPS search, UC ingestion 200 traces/sec & 100 MB/sec per table → Tracing FAQ (https://docs.databricks.com/aws/en/mlflow3/genai/tracing/faq).

Gotchas: async = fire-and-forget, so flush on graceful shutdown or you lose queued traces on container kill. Keep voice audio/full transcripts as artifacts/URIs — don't inline them in trace payloads (latency overhead will climb to 100ms + above 1 MB). Production Monitoring Delta sync runs ~every 15 min, so use online judges for realtime alerting.

3. Alternatives:

- Stay on Lakebase custom traces — fine, but you rebuild dashboards/search/judges yourself

- Recommended: MLflow → Databricks (above)

- OpenTelemetry collector → multi-sink — MLflow spans are OTel-compatible, good if you also want Datadog/Grafana or vendor-neutrality

- Self-hosted OSS MLflow + Postgres — only if data-residency forces it; you operate everything and lose UC/judges

Net recommendation: the external-deployment MLflow pattern with mlflow-tracing SDK + OAuth M2M + a UC-backed experiment. It's the documented path, gets you centralized observability in Databricks, and unlocks the LLM-as-judge eval layer (hallucination, PII, response-relevance) that a voice agent will want next.

Databricks Community

MLFlow tracking from Azure Container Instance

Solution Accelerator Series | Large Language Models (LLMs) for Customer Service Analytics

🌟 Community Pulse: Your Weekly Roundup! June 01 – 07, 2026

DAIS 2026 Speaker Spotlight Series #17 | Kent Marten

FREE TRAINING: Databricks Business Impact Accelerator

FLASH SALE: Save 50% on Summit Training ⚡