cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Detecting Photon fallback in-cluster + safe right-sizing from system tables

Yogasathyandrun
Visitor

I'm prototyping a cluster cost / right-sizing advisor and wanted to get a reality-check from people running Databricks at real scale before I sink more time into it.

The main thing I'm chasing is Photon fallback. Photon quietly drops to the JVM on unsupported ops (Python UDFs, some struct/array predicates, a few Delta features), so you keep paying the Photon DBU premium while getting JVM speed, and as far as I can tell it's basically invisible in the UI. Alongside that, the usual right-sizing stuff: over-provisioned workers/driver, idle clusters.

Where I've got to so far:

In-cluster collection — a bundled JVM QueryExecutionListener reads executedPlan and only flags real fallback (mid-plan ColumnarToRow, RowToColumnar round-trips, BatchEvalPython/ArrowEvalPython), ignoring the benign terminal ColumnarToRow that every query ends on. A SparkListener grabs the executor curve and stage/task timing. It self-arms at interpreter startup via a .pth.

System tables for ground truth — node_timeline (CPU/mem, P95), compute.clusters (config/autoscale), billing.usage × list_prices (billed cost).

Engine — classify FIXED/AUTOSCALE, then a step (won't suggest a downsize if peak CPU/mem is high, there's memory spill, or the evidence is thin), then cost (billed when I have the grant, else modeled), and runtime impact as a bounded range rather than a single number.

The stuff I'm actually stuck on:

  • Shared & serverless seal the JVM (Spark Connect), so the listener can't attach and I get nothing for Photon on standard/shared access mode. Has anyone found a supported way to see Photon fallback there (query profile API, system.query.history, something on the roadmap), or is in-process really the only path today?
  • Rolling the listener out across a fleet — what's the least-intrusive pattern your platform team would actually sign off on? Cluster policy + allowlisted library, a global init script, spark.sql.queryExecutionListeners via policy (and does that even register on shared mode for you)?
  • Observation window for right-sizing — to avoid recommending a downsize right before a weekend/month-end batch, what do you trust: a fixed N-day window that's guaranteed to cover the business cycle, or a peak-based sample gate? Curious what's held up in practice.
  • system.billing per-cluster attribution — any gotchas joining usage  list_prices (price-window edges, how complete usage_metadata.cluster_id is, serverless SKUs)?

And the two I most want opinions on:

  • If you're already fighting cluster cost, what's the part that's still annoying and unsolved? (idle detection, autoscaling tuning, ephemeral job-cluster sprawl, Photon ROI, spot/driver sizing, whatever it is.)
  • Does this already overlap something that does it well (system-tables dashboards, Overwatch, a third-party FinOps tool)? And if so, where's the gap that's still worth filling?

Not selling anything, just trying to work out whether I'm reinventing a wheel or if there's a real gap here. Happy to be told it's the former.

Data Engineer | Apache Spark | Delta Lake | Databricks
0 REPLIES 0