Iām building a custom UI table (using Next.js and FastAPI) to display MLflow trace data from a Retrieval-Augmented Generation (RAG) application running on Databricks Managed MLflow 3.0. The table needs to show answer generation speed (from CHAT_MODEL spans), query processing time (from RETRIEVER spans), user ID, prompt template ID, and user feedback (good/bad rating and comments, submitted asynchronously via an API). Iām planning to store this data in a Delta Table to support sorting, filtering, and pagination, as the default MLflow trace limit (100,000 per workspace) and search_traces() rate limit (25 QPS) may not scale for my use case.
Iām considering two approaches to sync MLflow traces and user feedback with the Delta Table:
1. Streaming with Delta Live Tables (DLT): Stream new traces and feedback into a Delta Table for near-real-time updates, given that feedback can arrive anytime after a query. Is DLT the best approach for this, and how can I efficiently ingest MLflow traces (stored in the managed backend) and feedback (stored in a separate Delta Table) into a unified Delta Table? Are there best practices for setting up the pipeline to handle high trace volumes and asynchronous feedback?
2. Scheduled Cron Job: Use a Databricks Workflow to periodically fetch new traces and feedback, merging them into the Delta Table. Would this be sufficient for a high-volume application, or will the search_traces() rate limit cause issues?
Key questions:
⢠Does MLflow or Databricks provide a native way to stream traces to a Delta Table, or do I need to export traces to an intermediate format (e.g., Parquet on DBFS)?
⢠How can I efficiently associate asynchronous user feedback (linked by trace_id) with traces in the Delta Table?
⢠Are there performance considerations or best practices for querying the Delta Table with Spark SQL or Databricks SQL to power a responsive UI table with sorting, filtering, and pagination?
⢠If I approach the 100,000-trace limit, whatās the process for requesting an increase, and how does it impact streaming or batch syncing?
Any example code, pipeline configurations, or recommendations for DLT vs. cron jobs would be greatly appreciated. Thanks!