cancel
Showing results forĀ 
Search instead forĀ 
Did you mean:Ā 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forĀ 
Search instead forĀ 
Did you mean:Ā 

How to Efficiently Sync MLflow Traces and Asynchronous User Feedback with a Delta Table

getsome
New Contributor

I’m building a custom UI table (using Next.js and FastAPI) to display MLflow trace data from a Retrieval-Augmented Generation (RAG) application running on Databricks Managed MLflow 3.0. The table needs to show answer generation speed (from CHAT_MODEL spans), query processing time (from RETRIEVER spans), user ID, prompt template ID, and user feedback (good/bad rating and comments, submitted asynchronously via an API). I’m planning to store this data in a Delta Table to support sorting, filtering, and pagination, as the default MLflow trace limit (100,000 per workspace) and search_traces() rate limit (25 QPS) may not scale for my use case.

I’m considering two approaches to sync MLflow traces and user feedback with the Delta Table:

1.  Streaming with Delta Live Tables (DLT): Stream new traces and feedback into a Delta Table for near-real-time updates, given that feedback can arrive anytime after a query. Is DLT the best approach for this, and how can I efficiently ingest MLflow traces (stored in the managed backend) and feedback (stored in a separate Delta Table) into a unified Delta Table? Are there best practices for setting up the pipeline to handle high trace volumes and asynchronous feedback?

2.  Scheduled Cron Job: Use a Databricks Workflow to periodically fetch new traces and feedback, merging them into the Delta Table. Would this be sufficient for a high-volume application, or will the search_traces() rate limit cause issues?

Key questions:

•  Does MLflow or Databricks provide a native way to stream traces to a Delta Table, or do I need to export traces to an intermediate format (e.g., Parquet on DBFS)?

•  How can I efficiently associate asynchronous user feedback (linked by trace_id) with traces in the Delta Table?

•  Are there performance considerations or best practices for querying the Delta Table with Spark SQL or Databricks SQL to power a responsive UI table with sorting, filtering, and pagination?

•  If I approach the 100,000-trace limit, what’s the process for requesting an increase, and how does it impact streaming or batch syncing?

Any example code, pipeline configurations, or recommendations for DLT vs. cron jobs would be greatly appreciated. Thanks!

0 REPLIES 0

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now