- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-17-2025 11:45 AM
Hi @utkarshamone ,
We faced a similar issue and I wanted to share our findings, which might help clarify what’s going on.
We’re using a Classic SQL Warehouse size L (v2025.15), and executing a dbt pipeline on top of it.
Our dbt jobs started to fail with internal Databricks errors and are affecting our production pipeline too.
Then I checked the pipeline in depth and saw the following in the query profile and Spark UI
Classic Warehouse (FAILED)
Execution details:
- Fixed 256 shuffle partitions
- Fails in: PhotonUnionShuffleExchangeSink
- Peak memory total ≈ 91.9 GiB
- 0 rows output
- Multiple executors exited with code 134 (SIGABRT)
- Spill = 0 bytes (crashes before spilling)
- Dead executors, hundreds of failed tasks
- Off‑heap memory peak = 7–8 GiB before crash
- Input: 213 GiB read, 671 M rows
- Task time in Photon = 18 %
- My analysis: Photon may under-estimate memory requirements during the union shuffle. One partition becomes too large (“elephant”), exceeds executor memory, malloc fails, and triggers SIGABRT.
Serverless Warehouse (SUCCEEDED)
Execution details:
- AQE enabled, partitions dynamically adjusted (~2,000 early, coalesced later)
- Sort operators: 52 GiB / 46 GiB total
- ShuffleExchange: Peak memory = 18 GiB, Peak per-task ≈ 280 MiB
- No executor losses
- Spill = 0 bytes
- Failed Tasks = 0
- Runtime: 1 min 46 s
- Task time in Photon = 99 %
- My analysis: AQE + newer Photon version effectively balances partitions and avoids memory hotspots.
We reported this to Databricks support.They confirmed:
"Engineering identified the root cause and has prepared a fix.
It will be included in the next maintenance cycle, scheduled for end of May 2025."
Until the fix is deployed:
- Check the query profile and Spark UI to identify where the hotspot occurs
- Switch to Serverless SQL Warehouse provisionally for production dbt pipelines (stable + memory-safe)
- Reevaluate using Classic at the end of May, once the new version is available
Hope this helps! 🙂
Isi