<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to check integrity on tables with PRIMARY KEY RELY optimization in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-to-check-integrity-on-tables-with-primary-key-rely/m-p/121751#M46538</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9268"&gt;@Malthe&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;You're absolutely right - it's completely reasonable to want to verify constraint integrity without relying on the optimizer's assumptions.&lt;BR /&gt;This is a classic challenge with query optimizers that use constraint information for optimization.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Current Options in Databricks&lt;/STRONG&gt;&lt;BR /&gt;Unfortunately, there isn't a straightforward session-level setting to disable RELY-based optimizations while keeping Photon enabled.&lt;BR /&gt;However, here are several approaches you can use:&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;1. Query Hints/Optimizer Directives&lt;/STRONG&gt;&lt;BR /&gt;While Databricks doesn't expose all optimizer controls publicly.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;2. Rewrite Queries to Avoid Optimizer Shortcuts&lt;/STRONG&gt;&lt;BR /&gt;Structure your integrity checks to bypass the optimizer's assumptions:&lt;/P&gt;&lt;P&gt;-- Instead of direct constraint validation&lt;BR /&gt;-- Use subqueries or CTEs that force full evaluation&lt;BR /&gt;WITH raw_data AS (&lt;BR /&gt;SELECT * FROM your_table TABLESAMPLE (100 PERCENT)&lt;BR /&gt;)&lt;BR /&gt;SELECT COUNT(*) FROM raw_data WHERE NOT (your_constraint_condition);&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;STRONG&gt;3. Use TABLESAMPLE or DISTRIBUTE BY&lt;/STRONG&gt;&lt;BR /&gt;Force the query planner to read all data:&lt;/P&gt;&lt;P&gt;SELECT COUNT(*)&lt;BR /&gt;FROM your_table TABLESAMPLE (100 PERCENT)&lt;BR /&gt;WHERE NOT (your_constraint_condition)&lt;BR /&gt;DISTRIBUTE BY rand();&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;4. Temporary Constraint Removal&lt;/STRONG&gt;&lt;BR /&gt;-- Remove RELY temporarily&lt;BR /&gt;ALTER TABLE your_table DROP CONSTRAINT constraint_name;&lt;BR /&gt;-- Run integrity check&lt;BR /&gt;-- Re-add constraint with RELY&lt;BR /&gt;ALTER TABLE your_table ADD CONSTRAINT constraint_name CHECK (...) RELY;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 13 Jun 2025 21:37:08 GMT</pubDate>
    <dc:creator>lingareddy_Alva</dc:creator>
    <dc:date>2025-06-13T21:37:08Z</dc:date>
    <item>
      <title>How to check integrity on tables with PRIMARY KEY RELY optimization</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-check-integrity-on-tables-with-primary-key-rely/m-p/121748#M46536</link>
      <description>&lt;P&gt;Databricks can now use RELY to &lt;A href="https://learn.microsoft.com/en-us/azure/databricks/sql/user/queries/query-optimization-constraints" target="_self"&gt;optimize some queries&lt;/A&gt; when using Photon-enabled compute.&lt;/P&gt;&lt;P&gt;But what if one wanted to check the integrity of the table, actually &lt;EM&gt;not relying&lt;/EM&gt; on the constraint. That's not an unreasonable ask I would think.&lt;/P&gt;&lt;P&gt;Is there a way to run a query with these optimizations disabled? Ideally, without needing to set up a cluster where Photon is disabled (also because it doesn't feel like a very stable way to guarantee that the integrity check runs correctly).&lt;/P&gt;</description>
      <pubDate>Fri, 13 Jun 2025 21:20:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-check-integrity-on-tables-with-primary-key-rely/m-p/121748#M46536</guid>
      <dc:creator>Malthe</dc:creator>
      <dc:date>2025-06-13T21:20:43Z</dc:date>
    </item>
    <item>
      <title>Re: How to check integrity on tables with PRIMARY KEY RELY optimization</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-check-integrity-on-tables-with-primary-key-rely/m-p/121751#M46538</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9268"&gt;@Malthe&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;You're absolutely right - it's completely reasonable to want to verify constraint integrity without relying on the optimizer's assumptions.&lt;BR /&gt;This is a classic challenge with query optimizers that use constraint information for optimization.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Current Options in Databricks&lt;/STRONG&gt;&lt;BR /&gt;Unfortunately, there isn't a straightforward session-level setting to disable RELY-based optimizations while keeping Photon enabled.&lt;BR /&gt;However, here are several approaches you can use:&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;1. Query Hints/Optimizer Directives&lt;/STRONG&gt;&lt;BR /&gt;While Databricks doesn't expose all optimizer controls publicly.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;2. Rewrite Queries to Avoid Optimizer Shortcuts&lt;/STRONG&gt;&lt;BR /&gt;Structure your integrity checks to bypass the optimizer's assumptions:&lt;/P&gt;&lt;P&gt;-- Instead of direct constraint validation&lt;BR /&gt;-- Use subqueries or CTEs that force full evaluation&lt;BR /&gt;WITH raw_data AS (&lt;BR /&gt;SELECT * FROM your_table TABLESAMPLE (100 PERCENT)&lt;BR /&gt;)&lt;BR /&gt;SELECT COUNT(*) FROM raw_data WHERE NOT (your_constraint_condition);&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;STRONG&gt;3. Use TABLESAMPLE or DISTRIBUTE BY&lt;/STRONG&gt;&lt;BR /&gt;Force the query planner to read all data:&lt;/P&gt;&lt;P&gt;SELECT COUNT(*)&lt;BR /&gt;FROM your_table TABLESAMPLE (100 PERCENT)&lt;BR /&gt;WHERE NOT (your_constraint_condition)&lt;BR /&gt;DISTRIBUTE BY rand();&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;4. Temporary Constraint Removal&lt;/STRONG&gt;&lt;BR /&gt;-- Remove RELY temporarily&lt;BR /&gt;ALTER TABLE your_table DROP CONSTRAINT constraint_name;&lt;BR /&gt;-- Run integrity check&lt;BR /&gt;-- Re-add constraint with RELY&lt;BR /&gt;ALTER TABLE your_table ADD CONSTRAINT constraint_name CHECK (...) RELY;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 13 Jun 2025 21:37:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-check-integrity-on-tables-with-primary-key-rely/m-p/121751#M46538</guid>
      <dc:creator>lingareddy_Alva</dc:creator>
      <dc:date>2025-06-13T21:37:08Z</dc:date>
    </item>
    <item>
      <title>Re: How to check integrity on tables with PRIMARY KEY RELY optimization</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-check-integrity-on-tables-with-primary-key-rely/m-p/121767#M46542</link>
      <description>&lt;P&gt;Unfortunately, none of these suggestions had any effect.&lt;/P&gt;&lt;P&gt;I seem to have been able (for now) to work around the optimization using EXECUTE IMMEDIATE sql INTO var, crafting a query string on the form "SELECT COUNT(*) - COUNT(DISTINCT id)".&lt;/P&gt;&lt;P&gt;I suppose the dynamic evaluation of the query somehow disables the optimization.&lt;/P&gt;</description>
      <pubDate>Sat, 14 Jun 2025 05:56:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-check-integrity-on-tables-with-primary-key-rely/m-p/121767#M46542</guid>
      <dc:creator>Malthe</dc:creator>
      <dc:date>2025-06-14T05:56:56Z</dc:date>
    </item>
  </channel>
</rss>

