Isi
Honored Contributor III

Hi @utkarshamone ,

We faced a similar issue and I wanted to share our findings, which might help clarify what’s going on.

We’re using a Classic SQL Warehouse size L (v2025.15), and executing a dbt pipeline on top of it.

Our dbt jobs started to fail with internal Databricks errors and are affecting our production pipeline too.

Then I checked the pipeline in depth and saw the following in the query profile and Spark UI

Classic Warehouse (FAILED)

Execution details:

  • Fixed 256 shuffle partitions
  • Fails in: PhotonUnionShuffleExchangeSink
    • Peak memory total ≈ 91.9 GiB
    • 0 rows output
    • Multiple executors exited with code 134 (SIGABRT)
  • Spill = 0 bytes (crashes before spilling)
  • Dead executors, hundreds of failed tasks
  • Off‑heap memory peak = 7–8 GiB before crash
  • Input: 213 GiB read, 671 M rows
  • Task time in Photon = 18 %
  • My analysis: Photon may under-estimate memory requirements during the union shuffle. One partition becomes too large (“elephant”), exceeds executor memory, malloc fails, and triggers SIGABRT.

 

Serverless Warehouse (SUCCEEDED)

Execution details:

  • AQE enabled, partitions dynamically adjusted (~2,000 early, coalesced later)
  • Sort operators: 52 GiB / 46 GiB total
  • ShuffleExchange: Peak memory = 18 GiB, Peak per-task ≈ 280 MiB
  • No executor losses
  • Spill = 0 bytes
  • Failed Tasks = 0
  • Runtime: 1 min 46 s
  • Task time in Photon = 99 %
  • My analysis: AQE + newer Photon version effectively balances partitions and avoids memory hotspots.

 

We reported this to Databricks support.They confirmed:

"Engineering identified the root cause and has prepared a fix.
It will be included in the next maintenance cycle, scheduled for end of May 2025."

Until the fix is deployed:

  • Check the query profile and Spark UI to identify where the hotspot occurs
  • Switch to Serverless SQL Warehouse provisionally for production dbt pipelines (stable + memory-safe)
  • Reevaluate using Classic at the end of May, once the new version is available

 

Hope this helps! 🙂

Isi