cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How to check integrity on tables with PRIMARY KEY RELY optimization

Malthe
New Contributor III

Databricks can now use RELY to optimize some queries when using Photon-enabled compute.

But what if one wanted to check the integrity of the table, actually not relying on the constraint. That's not an unreasonable ask I would think.

Is there a way to run a query with these optimizations disabled? Ideally, without needing to set up a cluster where Photon is disabled (also because it doesn't feel like a very stable way to guarantee that the integrity check runs correctly).

2 REPLIES 2

lingareddy_Alva
Honored Contributor II

Hi @Malthe 

You're absolutely right - it's completely reasonable to want to verify constraint integrity without relying on the optimizer's assumptions.
This is a classic challenge with query optimizers that use constraint information for optimization.

Current Options in Databricks
Unfortunately, there isn't a straightforward session-level setting to disable RELY-based optimizations while keeping Photon enabled.
However, here are several approaches you can use:

1. Query Hints/Optimizer Directives
While Databricks doesn't expose all optimizer controls publicly.

2. Rewrite Queries to Avoid Optimizer Shortcuts
Structure your integrity checks to bypass the optimizer's assumptions:

-- Instead of direct constraint validation
-- Use subqueries or CTEs that force full evaluation
WITH raw_data AS (
SELECT * FROM your_table TABLESAMPLE (100 PERCENT)
)
SELECT COUNT(*) FROM raw_data WHERE NOT (your_constraint_condition);


3. Use TABLESAMPLE or DISTRIBUTE BY
Force the query planner to read all data:

SELECT COUNT(*)
FROM your_table TABLESAMPLE (100 PERCENT)
WHERE NOT (your_constraint_condition)
DISTRIBUTE BY rand();

4. Temporary Constraint Removal
-- Remove RELY temporarily
ALTER TABLE your_table DROP CONSTRAINT constraint_name;
-- Run integrity check
-- Re-add constraint with RELY
ALTER TABLE your_table ADD CONSTRAINT constraint_name CHECK (...) RELY;

 

 

LR

Malthe
New Contributor III

Unfortunately, none of these suggestions had any effect.

I seem to have been able (for now) to work around the optimization using EXECUTE IMMEDIATE sql INTO var, crafting a query string on the form "SELECT COUNT(*) - COUNT(DISTINCT id)".

I suppose the dynamic evaluation of the query somehow disables the optimization.