cancel
Showing results for 
Search instead for 
Did you mean: 
Generative AI
Explore discussions on generative artificial intelligence techniques and applications within the Databricks Community. Share ideas, challenges, and breakthroughs in this cutting-edge field.
cancel
Showing results for 
Search instead for 
Did you mean: 

MLFlow tracking from Azure Container Instance

Ale_Armillotta
Valued Contributor II

Hi everyone,

we're building a voice chatbot for a customer using a mix of technologies — Databricks, Azure AI Foundry, and a few external containerized services.

Currently, we're tracking requests and logs via Lakebase with custom traces, but I'm now evaluating whether it makes sense to shift to MLflow (Databricks-managed) for tracing instead.

I came across this tutorial on connecting an external environment to MLflow: 👉 Connect your dev environment to MLflow – Databricks Docs

The guide focuses on local/dev setups, but our use case is different:

  • The chatbot runs in an external container (not inside Databricks)
  • We need to track GenAI traces (inputs, outputs, latency, etc.) in production
  • We want a centralized observability layer directly in Databricks

My questions:

  1. Is it technically feasible to send MLflow traces from an external production container to a Databricks-managed MLflow instance?
  2. Is this approach recommended for production, or are there known limitations/gotchas?
  3. Any alternative patterns you'd suggest for GenAI observability in this kind of hybrid architecture?

Thanks in advance 🙏

Alessandro

 

 

2 REPLIES 2

WorksBuddy
New Contributor II

Hey Alessandro,

1. Technically feasible? Yes.

MLflow's tracking client is just HTTP. As long as your container can reach the Databricks-managed MLflow endpoint, you can send traces from anywhere. Set your workspace host and token as environment variables inside the container, point the tracking URI to Databricks, and you're connected.

2. Production gotchas:

The biggest one for a voice chatbot: don't let MLflow calls block your response path. Send traces asynchronously so observability never adds to user-facing latency.

Use a service principal over a PAT for auth. PATs expire, and manual rotation will catch you off guard in production. Also sort your network config early. If your ACI and Databricks workspace are in different VNets, private endpoints or IP allow-listing needs to be in place before anything else.

3. Alternative worth considering:

If latency is still a concern after going async, an OpenTelemetry sidecar collector in the same container gives you better control over batching and buffering before traces hit the network. You can export via OTLP to Databricks without the direct MLflow overhead.

The hybrid architecture you're describing is a common pattern now. MLflow as the centralized observability layer is a reasonable call. Just keep the tracing async and auth service-principal-based before you go live.

Good luck with the build.

MoJaMa
Databricks Employee
Databricks Employee

Similar response to the one from WorksBuddy.  

Short answer: yes, it's supported, and there's a specific Databricks guide for your case.

  The tutorial you found is for local/IDE; the production-container equivalent is here: Trace agents deployed outside of Databricks (https://docs.databricks.com/aws/en/mlflow3/genai/tracing/prod-tracing-external).
Same wiring (four 
env vars + the mlflow-tracing SDK).

  1. Feasibility — Yes. Install mlflow-tracing, set DATABRICKS_HOST, DATABRICKS_TOKEN, MLFLOW_TRACKING_URI=databricks (literal string), MLFLOW_EXPERIMENT_NAME. Traces ship over HTTPS, async logging is on by default so it's off your request path. Your container, Databricks components, and Azure AI Foundry can be stitched into one trace via W3C TraceContext header propagation → Distributed Tracing (https://mlflow.org/docs/latest/genai/tracing/app-instrumentation/distributed-tracing).

  2. Recommended for prod with these upgrades over the dev tutorial:

  - OAuth M2M service principal instead of PAT → docs (https://docs.databricks.com/aws/en/dev-tools/auth/oauth-m2m)

  - mlflow-tracing package (~5 MB) instead of full mlflow (~1 GB); don't install both → docs (https://mlflow.org/docs/latest/genai/tracing/lightweight-sdk)

  - Unity Catalog–backed experiment to escape the 100K-trace cap and 1,000-trace search ceiling

  - Tune MLFLOW_TRACE_SAMPLING_RATIO, async worker/queue sizes, and MLFLOW_TRACE_TIMEOUT_SECONDS → Production Tracing (https://mlflow.org/docs/latest/genai/tracing/prod-tracing)

  Limits to plan around (per workspace): 200 QPS trace creation, 25 QPS search, UC ingestion 200 traces/sec & 100 MB/sec per table → Tracing FAQ (https://docs.databricks.com/aws/en/mlflow3/genai/tracing/faq).

  Gotchas: async = fire-and-forget, so flush on graceful shutdown or you lose queued traces on container kill. Keep voice audio/full transcripts as artifacts/URIs — don't inline them in trace payloads (latency overhead will climb to 100ms + above 1 MB). Production Monitoring Delta sync runs ~every 15 min, so use online judges for realtime alerting.

  3. Alternatives:

  - Stay on Lakebase custom traces — fine, but you rebuild dashboards/search/judges yourself

  - Recommended: MLflow → Databricks (above)

  - OpenTelemetry collector multi-sink — MLflow spans are OTel-compatible, good if you also want Datadog/Grafana or vendor-neutrality

  - Self-hosted OSS MLflow + Postgres — only if data-residency forces it; you operate everything and lose UC/judges

  Net recommendation: the external-deployment MLflow pattern with mlflow-tracing SDK + OAuth M2M + a UC-backed experiment. It's the documented path, gets you centralized observability in Databricks, and unlocks the LLM-as-judge eval layer (hallucination, PII, response-relevance) that a voice agent will want next.