<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to Efficiently Sync MLflow Traces and Asynchronous User Feedback with a Delta Table in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-to-efficiently-sync-mlflow-traces-and-asynchronous-user/m-p/133390#M49828</link>
    <description>&lt;P&gt;Hello! Here are the answers to your questions:&amp;nbsp;&lt;/P&gt;
&lt;P&gt;- Yes! See&amp;nbsp;&lt;A href="https://docs.databricks.com/aws/en/mlflow3/genai/tracing/prod-tracing" target="_self"&gt;databricks managed mlflow tracing&lt;/A&gt;&amp;nbsp;- enable production monitor or endpoint config to collect traces in a delta table&lt;/P&gt;
&lt;P&gt;- We have example code for &lt;A href="https://docs.databricks.com/aws/en/mlflow3/genai/tracing/collect-user-feedback#implementing-feedback-collection" target="_blank"&gt;implementing async feedback collection&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;- Definitely. See a comprehensive summary of considerations &lt;A href="https://www.databricks.com/discover/pages/optimize-data-workloads-guide#data-caching" target="_blank"&gt;here&lt;/A&gt;. You can drill down into specific pieces of your workflow that you want to speed up - using caching and other techniques.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;- This is something you can take up directly with your Account Team. If you do not have one, you can make the formal request through Databricks support.&amp;nbsp;&lt;SPAN&gt;As you near the 100,000-trace limit in MLflow, new traces may be rejected if rolling deletion isn’t enabled, disrupting streaming and batch syncing. If rolling deletion is active, old traces are purged to allow new ones, reducing historical data retention. High trace volumes near the limit can slow performance or cause errors for both streaming and batch jobs. Scaling resources and enabling rolling deletion help minimize issues.&lt;/SPAN&gt;&lt;/P&gt;
&lt;DIV class="_12ulyue0 _7pq7t610 _7pq7t61e"&gt;
&lt;DIV class="_7pq7t610 _7pq7t61c _7pq7t62o _7pq7t66q"&gt;
&lt;DIV class="_7pq7t64 _7pq7t610 _7pq7t62k _7pq7t61s"&gt;
&lt;DIV class="_17yk06p0"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="_17yk06p0"&gt;I hope this helps.&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="_17yk06p0"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="_17yk06p0"&gt;Best,&lt;/DIV&gt;
&lt;DIV class="_17yk06p0"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="_17yk06p0"&gt;Sarah&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;</description>
    <pubDate>Tue, 30 Sep 2025 14:41:59 GMT</pubDate>
    <dc:creator>sarahbhord</dc:creator>
    <dc:date>2025-09-30T14:41:59Z</dc:date>
    <item>
      <title>How to Efficiently Sync MLflow Traces and Asynchronous User Feedback with a Delta Table</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-efficiently-sync-mlflow-traces-and-asynchronous-user/m-p/124965#M47302</link>
      <description>&lt;P&gt;&lt;SPAN&gt;I’m building a custom UI table (using Next.js and FastAPI) to display MLflow trace data from a Retrieval-Augmented Generation (RAG) application running on Databricks Managed MLflow 3.0. The table needs to show &lt;/SPAN&gt;&lt;SPAN&gt;answer generation speed&lt;/SPAN&gt;&lt;SPAN&gt; (from CHAT_MODEL spans), &lt;/SPAN&gt;&lt;SPAN&gt;query processing time&lt;/SPAN&gt;&lt;SPAN&gt; (from RETRIEVER spans), &lt;/SPAN&gt;&lt;SPAN&gt;user ID&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;prompt template ID&lt;/SPAN&gt;&lt;SPAN&gt;, and &lt;/SPAN&gt;&lt;SPAN&gt;user feedback&lt;/SPAN&gt;&lt;SPAN&gt; (good/bad rating and comments, submitted asynchronously via an API). I’m planning to store this data in a &lt;/SPAN&gt;&lt;SPAN&gt;Delta Table&lt;/SPAN&gt;&lt;SPAN&gt; to support sorting, filtering, and pagination, as the default MLflow trace limit (100,000 per workspace) and search_traces() rate limit (25 QPS) may not scale for my use case.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;I’m considering two approaches to sync MLflow traces and user feedback with the Delta Table:&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;1.&lt;SPAN class=""&gt;&amp;nbsp; &lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN&gt;Streaming with Delta Live Tables (DLT)&lt;/SPAN&gt;&lt;SPAN&gt;: Stream new traces and feedback into a Delta Table for near-real-time updates, given that feedback can arrive anytime after a query. Is DLT the best approach for this, and how can I efficiently ingest MLflow traces (stored in the managed backend) and feedback (stored in a separate Delta Table) into a unified Delta Table? Are there best practices for setting up the pipeline to handle high trace volumes and asynchronous feedback?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;2.&lt;SPAN class=""&gt;&amp;nbsp; &lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN&gt;Scheduled Cron Job&lt;/SPAN&gt;&lt;SPAN&gt;: Use a Databricks Workflow to periodically fetch new traces and feedback, merging them into the Delta Table. Would this be sufficient for a high-volume application, or will the search_traces() rate limit cause issues?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Key questions:&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;•&lt;SPAN class=""&gt;&amp;nbsp; &lt;/SPAN&gt;Does MLflow or Databricks provide a native way to stream traces to a Delta Table, or do I need to export traces to an intermediate format (e.g., Parquet on DBFS)?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;•&lt;SPAN class=""&gt;&amp;nbsp; &lt;/SPAN&gt;How can I efficiently associate asynchronous user feedback (linked by trace_id) with traces in the Delta Table?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;•&lt;SPAN class=""&gt;&amp;nbsp; &lt;/SPAN&gt;Are there performance considerations or best practices for querying the Delta Table with Spark SQL or Databricks SQL to power a responsive UI table with sorting, filtering, and pagination?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;•&lt;SPAN class=""&gt;&amp;nbsp; &lt;/SPAN&gt;If I approach the 100,000-trace limit, what’s the process for requesting an increase, and how does it impact streaming or batch syncing?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Any example code, pipeline configurations, or recommendations for DLT vs. cron jobs would be greatly appreciated. Thanks!&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 11 Jul 2025 18:31:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-efficiently-sync-mlflow-traces-and-asynchronous-user/m-p/124965#M47302</guid>
      <dc:creator>getsome</dc:creator>
      <dc:date>2025-07-11T18:31:47Z</dc:date>
    </item>
    <item>
      <title>Re: How to Efficiently Sync MLflow Traces and Asynchronous User Feedback with a Delta Table</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-efficiently-sync-mlflow-traces-and-asynchronous-user/m-p/133390#M49828</link>
      <description>&lt;P&gt;Hello! Here are the answers to your questions:&amp;nbsp;&lt;/P&gt;
&lt;P&gt;- Yes! See&amp;nbsp;&lt;A href="https://docs.databricks.com/aws/en/mlflow3/genai/tracing/prod-tracing" target="_self"&gt;databricks managed mlflow tracing&lt;/A&gt;&amp;nbsp;- enable production monitor or endpoint config to collect traces in a delta table&lt;/P&gt;
&lt;P&gt;- We have example code for &lt;A href="https://docs.databricks.com/aws/en/mlflow3/genai/tracing/collect-user-feedback#implementing-feedback-collection" target="_blank"&gt;implementing async feedback collection&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;- Definitely. See a comprehensive summary of considerations &lt;A href="https://www.databricks.com/discover/pages/optimize-data-workloads-guide#data-caching" target="_blank"&gt;here&lt;/A&gt;. You can drill down into specific pieces of your workflow that you want to speed up - using caching and other techniques.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;- This is something you can take up directly with your Account Team. If you do not have one, you can make the formal request through Databricks support.&amp;nbsp;&lt;SPAN&gt;As you near the 100,000-trace limit in MLflow, new traces may be rejected if rolling deletion isn’t enabled, disrupting streaming and batch syncing. If rolling deletion is active, old traces are purged to allow new ones, reducing historical data retention. High trace volumes near the limit can slow performance or cause errors for both streaming and batch jobs. Scaling resources and enabling rolling deletion help minimize issues.&lt;/SPAN&gt;&lt;/P&gt;
&lt;DIV class="_12ulyue0 _7pq7t610 _7pq7t61e"&gt;
&lt;DIV class="_7pq7t610 _7pq7t61c _7pq7t62o _7pq7t66q"&gt;
&lt;DIV class="_7pq7t64 _7pq7t610 _7pq7t62k _7pq7t61s"&gt;
&lt;DIV class="_17yk06p0"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="_17yk06p0"&gt;I hope this helps.&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="_17yk06p0"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="_17yk06p0"&gt;Best,&lt;/DIV&gt;
&lt;DIV class="_17yk06p0"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="_17yk06p0"&gt;Sarah&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;</description>
      <pubDate>Tue, 30 Sep 2025 14:41:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-efficiently-sync-mlflow-traces-and-asynchronous-user/m-p/133390#M49828</guid>
      <dc:creator>sarahbhord</dc:creator>
      <dc:date>2025-09-30T14:41:59Z</dc:date>
    </item>
  </channel>
</rss>

