We have a Databricks pipeline where the layer reads from several Silver tables to detect PK/FK changes and trigger updates to Gold tables. Normally, this near real-time job has ~3 minutes latency per micro-batch.
Recently, we noticed that each batch is running much slower (10โ20+ minutes), even after scaling from N2-8 to N2-16 for both driver and executors โ without noticeable improvement.
I extracted job metrics and sorted them by AvgCompletionTime. The top bottlenecks are:
Deletion vector operations (number of deletion vector rows read, time spent reading deletion vectors, etc.) with ~6,100โ6,300 seconds average completion time per task.
Unified cache operations (unified cache populate time for ParquetFooter, unified cache populate time for RangeDeletionVector, unified cache read/serve bytes...) with ~5,000 seconds average completion time per task.
From the metrics, it seems that:
A lot of tasks are reading from disk cache (unified cache) but also spending significant time populating the cache.
Many Parquet footers, column chunks, and deletion vectors are being cached on the fly.
Cache hit ratio may be low due to constantly reading new data from CDF.
My questions are:
In a near real-time workload where data changes frequently, could unified cache cause more overhead than benefit (due to low hit rate)?
- Are there recommended strategies to reduce deletion vector overhead in Delta tables (e.g., optimize, vacuum, rewrite)?
Any tuning tips for improving batch performance in this case?
Would appreciate any guidance or best practices from the community.
Metrics table:
MetricName | TotalMetricValue | TotalNumberTasks | AvgCompletionTime |
number of non-local (rescheduled) scan tasks | 10 | 10 | 9531.9 |
unified cache hits count for RangeDeletionVector | 162448 | 14137 | 6274.97798742138 |
number of deletion vector rows read from disk cache | 1710803 | 14154 | 6138.35435435436 |
number of deletion vector rows read | 1710803 | 14154 | 6138.35435435436 |
number of deletion vectors read | 162794 | 14154 | 6138.35435435436 |
number of deletion vectors read from disk cache | 162794 | 14154 | 6138.35435435436 |
internal.metrics.resultSerializationTime | 142 | 2792 | 5848.54285714286 |
unified cache coalesce count for RangeDeletionVector | 242 | 2554 | 5409.65979381443 |
time spent reading deletion vectors | 63656715484 | 108783 | 5310.43413173653 |
size of deletion vectors read from disk cache | 8610813 | 108783 | 5310.43413173653 |
size of deletion vectors read | 8610813 | 108783 | 5310.43413173653 |
time spent waiting for fetched deletion vectors | 0 | 108783 | 5310.43413173653 |
size of deletion vectors read from memory cache | 0 | 108783 | 5310.43413173653 |
unified cache coalesce count for ParquetFooter | 67 | 1113 | 5191.16279069768 |
unified cache hits count for ParquetFooter | 682285 | 102155 | 5115.15193026152 |
unified cache hits count for ParquetColumnChunk | 2418275 | 102165 | 5109.88888888889 |
scan time | 4063119 | 106814 | 5071.2162818955 |
number of input batches | 1662757 | 106814 | 5071.2162818955 |
unified cache populate time for ParquetFooter | 485712457512 | 109744 | 5009.97760617761 |
uncompressed bytes read after filtering | 262858290309 | 109744 | 5009.97760617761 |
cache hits size (uncompressed) | 261150107387 | 109744 | 5009.97760617761 |
cache hits size | 198250435165 | 109744 | 5009.97760617761 |
stable cache serve bytes for ParquetColumnChunk | 187484742374 | 109744 | 5009.97760617761 |
unified cache serve bytes for ParquetColumnChunk | 187484742374 | 109744 | 5009.97760617761 |
unified cache read bytes for ParquetColumnChunk | 184745766945 | 109744 | 5009.97760617761 |
unified cache populate time for RangeDeletionVector | 21316692333 | 109744 | 5009.97760617761 |
unified cache serve bytes for ParquetFooter | 11843089123 | 109744 | 5009.97760617761 |
stable cache serve bytes for ParquetFooter | 11843089123 | 109744 | 5009.97760617761 |
unified cache read bytes for ParquetFooter | 11770827958 | 109744 | 5009.97760617761 |
Regards,
Hung Nguyen