4 weeks ago
I've been seeing patterns in DLT process time in all my pipelines, as in attached screenshot. Each data point is an "update" that's set to "continuous". The process time keeps increasing until a point and drops back to what it's desired to be. This was not by any manual change and I don't see any correlation to hour of the day, or day of the week etc, and it doesn't look like due to data feed either. Has anyone seen this and could help me to understand why it's like this and how to optimize?
Thanks!
4 weeks ago - last edited 4 weeks ago
Hi @michelleliu
This sawtooth pattern in DLT processing times is actually quite common and typically indicates one of several underlying issues. Here are the most likely causes and solutions:
Common Causes
1. Memory Pressure & Garbage Collection
Processing times increase as memory fills up with cached data, shuffle files, or intermediate results
Eventually triggers major garbage collection or memory cleanup, causing the "drop" back to baseline
More common with streaming workloads that accumulate state over time
2. Checkpoint Growth
Streaming checkpoints grow over time, making recovery operations slower
Periodic checkpoint cleanup causes the reset to faster times
3. Auto-scaling Behavior
Cluster starts with optimal resources, gradually loses executors due to perceived idle time
Eventually scales back up when performance degrades enough
The "drop" represents fresh executors joining
4. State Store Compaction
Stateful streaming operations accumulate state files
Periodic compaction/cleanup resets performance
4 weeks ago - last edited 4 weeks ago
Hi @michelleliu
This sawtooth pattern in DLT processing times is actually quite common and typically indicates one of several underlying issues. Here are the most likely causes and solutions:
Common Causes
1. Memory Pressure & Garbage Collection
Processing times increase as memory fills up with cached data, shuffle files, or intermediate results
Eventually triggers major garbage collection or memory cleanup, causing the "drop" back to baseline
More common with streaming workloads that accumulate state over time
2. Checkpoint Growth
Streaming checkpoints grow over time, making recovery operations slower
Periodic checkpoint cleanup causes the reset to faster times
3. Auto-scaling Behavior
Cluster starts with optimal resources, gradually loses executors due to perceived idle time
Eventually scales back up when performance degrades enough
The "drop" represents fresh executors joining
4. State Store Compaction
Stateful streaming operations accumulate state files
Periodic compaction/cleanup resets performance
4 weeks ago
Thank you @lingareddy_Alva! This is very insightful! I want to start from cleaning up memory and GC. I did a quick google and didn't find anything very solid. Should it be some job set up to run on the pipeline ID? Do you have any reference?
4 weeks ago
Hi @michelleliu
For DLT pipeline monitoring, you have several options depending on what you want to achieve:
Built-in DLT Monitoring: Try this
# Add this table to your existing DLT pipeline
@dlt.table(
comment="Pipeline performance metrics",
table_properties={"quality": "bronze"}
)
def pipeline_performance_metrics():
import time
from datetime import datetime
def collect_metrics():
executor_infos = spark.sparkContext.statusTracker().getExecutorInfos()
return [{
"timestamp": datetime.now(),
"pipeline_id": spark.conf.get("spark.databricks.pipelines.pipelineId"),
"update_id": spark.conf.get("spark.databricks.pipelines.updateId"),
"active_executors": len([e for e in executor_infos if e.executorId != "driver"]),
"total_memory_mb": sum([e.maxMemory for e in executor_infos if e.executorId != "driver"]) / 1024 / 1024,
"used_memory_mb": sum([e.memoryUsed for e in executor_infos if e.executorId != "driver"]) / 1024 / 1024,
"memory_utilization": sum([e.memoryUsed for e in executor_infos if e.executorId != "driver"]) / max(sum([e.maxMemory for e in executor_infos if e.executorId != "driver"]), 1)
}]
return spark.createDataFrame(collect_metrics())
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now