- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-11-2026 06:58 AM
We have a custom python notebook used to handle data loading. In this case, it's for a full overwrite of specific partitions. The notebook determines columns to use for the update based on incoming data. It creates a replace condition like this: replaceCondition : (sales_date sales_date,integration_store_key integration_store_key) IN ((to_date('2026-03-10'),'ABOUTYOU_115CH_TH_WHOL_4054986000000'),(to_date('2026-03-10'),'DE_PCNORD_CK_WHOL_4043654007816'), (...). By execution time, it looks like this: inset(named_struct('sales_date', sales_date, 'integration_store_key', integration_store_key))
During the execution, Photon is not engaged for the filter but rather falls back to COLUMNAR_TO_ROW → FILTER_EXEC in JVM. Even though the target table is liquid clustered on the two columns, no data skipping happens and a full scan is executed at a huge cost (over a billion rows read to update a few thousand). Zero row group pruning: Rows skipped via stats filtering = 0, Row groups skipped via stats filtering = 0, Data filters - row groups filtered = 0
Queery Profile shows:
Graph 0, photonExplain:
reason: UNIMPLEMENTED_OPERATORparams: OPERATOR = OverwriteByExpressionExecV1 [...]Graph 1, the filter node:
tag: FILTER_EXEC ← JVM operator, not PHOTON_FILTER_EXECinsightIds: ['99716c5c...']condition: inset(named_struct('sales_date', ..., 'integration_store_key', ...))And immediately above it:
tag: COLUMNAR_TO_ROW_EXEC ← data leaving Photon back to JVMtag: PHOTON_PROJECT_EXEC ← Photon stops heredatabricks documentation states: "getStatsColumnOpt filters out non-leaf StructType columns as they lack statistics and skipping predicates can't use them.". In my case, sales_date and integration_store_key are not themselves struct columns — they're scalar columns being wrapped into a struct by the named_struct(...) expression that inset() generates. It seems that because of the columns being wrapped up in a struct, statistics are not available and it defaults to JVM. Is it correct to say that: " skipping works on per-column min/max stats, and the inset(named_struct(...)) predicate is not decomposable into per-column comparisons the skipping engine can evaluate." ? Can anyone confirm the reason Photon doesn't work in my situation?