Hi @Malthe
You're absolutely right - it's completely reasonable to want to verify constraint integrity without relying on the optimizer's assumptions.
This is a classic challenge with query optimizers that use constraint information for optimization.
Current Options in Databricks
Unfortunately, there isn't a straightforward session-level setting to disable RELY-based optimizations while keeping Photon enabled.
However, here are several approaches you can use:
1. Query Hints/Optimizer Directives
While Databricks doesn't expose all optimizer controls publicly.
2. Rewrite Queries to Avoid Optimizer Shortcuts
Structure your integrity checks to bypass the optimizer's assumptions:
-- Instead of direct constraint validation
-- Use subqueries or CTEs that force full evaluation
WITH raw_data AS (
SELECT * FROM your_table TABLESAMPLE (100 PERCENT)
)
SELECT COUNT(*) FROM raw_data WHERE NOT (your_constraint_condition);
3. Use TABLESAMPLE or DISTRIBUTE BY
Force the query planner to read all data:
SELECT COUNT(*)
FROM your_table TABLESAMPLE (100 PERCENT)
WHERE NOT (your_constraint_condition)
DISTRIBUTE BY rand();
4. Temporary Constraint Removal
-- Remove RELY temporarily
ALTER TABLE your_table DROP CONSTRAINT constraint_name;
-- Run integrity check
-- Re-add constraint with RELY
ALTER TABLE your_table ADD CONSTRAINT constraint_name CHECK (...) RELY;
LR