Photon not used for the filter step (falls back to COLUMNAR_TO_ROW → FILTER_EXEC in JVM)

ItalSess_5094
New Contributor II

We have a custom python notebook used to handle data loading. In this case, it's for a full overwrite of specific partitions. The notebook determines columns to use for the update based on incoming data. It creates a replace condition like this: replaceCondition : (sales_date sales_date,integration_store_key integration_store_key) IN ((to_date('2026-03-10'),'ABOUTYOU_115CH_TH_WHOL_4054986000000'),(to_date('2026-03-10'),'DE_PCNORD_CK_WHOL_4043654007816'), (...).   By execution time, it looks like this:  inset(named_struct('sales_date', sales_date, 'integration_store_key', integration_store_key))   

During the execution, Photon is not engaged for the filter but rather falls back to COLUMNAR_TO_ROW → FILTER_EXEC in JVM. Even though the target table is liquid clustered on the two columns, no data skipping happens and a full scan is executed at a huge cost (over a billion rows read to update a few thousand).   Zero row group pruning: Rows skipped via stats filtering = 0, Row groups skipped via stats filtering = 0, Data filters - row groups filtered = 0

Queery Profile shows:

Graph 0, photonExplain:

 
 
reason: UNIMPLEMENTED_OPERATORparams: OPERATOR = OverwriteByExpressionExecV1 [...]

Graph 1, the filter node:

 
 
tag: FILTER_EXEC          ← JVM operator, not PHOTON_FILTER_EXECinsightIds: ['99716c5c...']condition: inset(named_struct('sales_date', ..., 'integration_store_key', ...))

And immediately above it:

 
 
tag: COLUMNAR_TO_ROW_EXEC ← data leaving Photon back to JVMtag: PHOTON_PROJECT_EXEC  ← Photon stops here

databricks documentation states: "getStatsColumnOpt filters out non-leaf StructType columns as they lack statistics and skipping predicates can't use them.".  In my case, sales_date and integration_store_key are not themselves struct columns — they're scalar columns being wrapped into a struct by the named_struct(...) expression that inset() generates.  It seems that because of the columns being wrapped up  in a struct, statistics are not available and it defaults to JVM. Is it correct to say that: " skipping works on per-column min/max stats, and the inset(named_struct(...)) predicate is not decomposable into per-column comparisons the skipping engine can evaluate." ? Can anyone confirm the reason Photon doesn't work in my situation?