<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: DLT Performance Issue in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/dlt-performance-issue/m-p/122047#M46635</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/107564"&gt;@michelleliu&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;This sawtooth pattern in DLT processing times is actually quite common and typically indicates one of several underlying issues. Here are the most likely causes and solutions:&lt;/P&gt;&lt;P&gt;Common Causes&lt;BR /&gt;1. Memory Pressure &amp;amp; Garbage Collection&lt;/P&gt;&lt;P&gt;Processing times increase as memory fills up with cached data, shuffle files, or intermediate results&lt;BR /&gt;Eventually triggers major garbage collection or memory cleanup, causing the "drop" back to baseline&lt;BR /&gt;More common with streaming workloads that accumulate state over time&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;2. Checkpoint Growth&lt;/P&gt;&lt;P&gt;Streaming checkpoints grow over time, making recovery operations slower&lt;BR /&gt;Periodic checkpoint cleanup causes the reset to faster times&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;3. Auto-scaling Behavior&lt;/P&gt;&lt;P&gt;Cluster starts with optimal resources, gradually loses executors due to perceived idle time&lt;BR /&gt;Eventually scales back up when performance degrades enough&lt;BR /&gt;The "drop" represents fresh executors joining&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;4. State Store Compaction&lt;/P&gt;&lt;P&gt;Stateful streaming operations accumulate state files&lt;BR /&gt;Periodic compaction/cleanup resets performance&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 18 Jun 2025 00:37:32 GMT</pubDate>
    <dc:creator>lingareddy_Alva</dc:creator>
    <dc:date>2025-06-18T00:37:32Z</dc:date>
    <item>
      <title>DLT Performance Issue</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-performance-issue/m-p/122029#M46629</link>
      <description>&lt;P&gt;I've been seeing patterns in DLT process time in all my pipelines, as in attached screenshot. Each data point is an "update" that's set to "continuous". The process time keeps increasing until a point and drops back to what it's desired to be. This was not by any manual change and I don't see any correlation to hour of the day, or day of the week etc, and it doesn't look like due to data feed either. Has anyone seen this and could help me to understand why it's like this and how to optimize?&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;</description>
      <pubDate>Tue, 17 Jun 2025 17:53:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-performance-issue/m-p/122029#M46629</guid>
      <dc:creator>michelleliu</dc:creator>
      <dc:date>2025-06-17T17:53:06Z</dc:date>
    </item>
    <item>
      <title>Re: DLT Performance Issue</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-performance-issue/m-p/122047#M46635</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/107564"&gt;@michelleliu&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;This sawtooth pattern in DLT processing times is actually quite common and typically indicates one of several underlying issues. Here are the most likely causes and solutions:&lt;/P&gt;&lt;P&gt;Common Causes&lt;BR /&gt;1. Memory Pressure &amp;amp; Garbage Collection&lt;/P&gt;&lt;P&gt;Processing times increase as memory fills up with cached data, shuffle files, or intermediate results&lt;BR /&gt;Eventually triggers major garbage collection or memory cleanup, causing the "drop" back to baseline&lt;BR /&gt;More common with streaming workloads that accumulate state over time&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;2. Checkpoint Growth&lt;/P&gt;&lt;P&gt;Streaming checkpoints grow over time, making recovery operations slower&lt;BR /&gt;Periodic checkpoint cleanup causes the reset to faster times&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;3. Auto-scaling Behavior&lt;/P&gt;&lt;P&gt;Cluster starts with optimal resources, gradually loses executors due to perceived idle time&lt;BR /&gt;Eventually scales back up when performance degrades enough&lt;BR /&gt;The "drop" represents fresh executors joining&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;4. State Store Compaction&lt;/P&gt;&lt;P&gt;Stateful streaming operations accumulate state files&lt;BR /&gt;Periodic compaction/cleanup resets performance&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 18 Jun 2025 00:37:32 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-performance-issue/m-p/122047#M46635</guid>
      <dc:creator>lingareddy_Alva</dc:creator>
      <dc:date>2025-06-18T00:37:32Z</dc:date>
    </item>
    <item>
      <title>Re: DLT Performance Issue</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-performance-issue/m-p/122181#M46682</link>
      <description>&lt;P&gt;Thank you&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/24053"&gt;@lingareddy_Alva&lt;/a&gt;! This is very insightful! I want to start from cleaning up memory and GC. I did a quick google and didn't find anything very solid. Should it be some job set up to run on the pipeline ID? Do you have any reference?&lt;/P&gt;</description>
      <pubDate>Wed, 18 Jun 2025 18:25:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-performance-issue/m-p/122181#M46682</guid>
      <dc:creator>michelleliu</dc:creator>
      <dc:date>2025-06-18T18:25:20Z</dc:date>
    </item>
    <item>
      <title>Re: DLT Performance Issue</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-performance-issue/m-p/122193#M46690</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/107564"&gt;@michelleliu&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;For DLT pipeline monitoring, you have several options depending on what you want to achieve:&lt;/P&gt;&lt;P&gt;Built-in DLT Monitoring: Try this&lt;/P&gt;&lt;LI-CODE lang="python"&gt;# Add this table to your existing DLT pipeline
@dlt.table(
    comment="Pipeline performance metrics",
    table_properties={"quality": "bronze"}
)
def pipeline_performance_metrics():
    import time
    from datetime import datetime
    
    def collect_metrics():
        executor_infos = spark.sparkContext.statusTracker().getExecutorInfos()
        
        return [{
            "timestamp": datetime.now(),
            "pipeline_id": spark.conf.get("spark.databricks.pipelines.pipelineId"),
            "update_id": spark.conf.get("spark.databricks.pipelines.updateId"),
            "active_executors": len([e for e in executor_infos if e.executorId != "driver"]),
            "total_memory_mb": sum([e.maxMemory for e in executor_infos if e.executorId != "driver"]) / 1024 / 1024,
            "used_memory_mb": sum([e.memoryUsed for e in executor_infos if e.executorId != "driver"]) / 1024 / 1024,
            "memory_utilization": sum([e.memoryUsed for e in executor_infos if e.executorId != "driver"]) / max(sum([e.maxMemory for e in executor_infos if e.executorId != "driver"]), 1)
        }]
    
    return spark.createDataFrame(collect_metrics())&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 18 Jun 2025 23:23:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-performance-issue/m-p/122193#M46690</guid>
      <dc:creator>lingareddy_Alva</dc:creator>
      <dc:date>2025-06-18T23:23:57Z</dc:date>
    </item>
  </channel>
</rss>

