<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic NOT NULL constraint violated for column during OPTIMIZE in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/not-null-constraint-violated-for-column-during-optimize/m-p/161176#M54997</link>
    <description>&lt;P&gt;We're running an optimize on a delta table with a VARIANT column that has a NOT NULL constraint.&lt;/P&gt;&lt;P&gt;Now, there are no NULL entries in this column. And yet, OPTIMIZE gives this error:&lt;/P&gt;&lt;P&gt;&lt;SPAN class=""&gt;[&lt;A class="" href="https://learn.microsoft.com/azure/databricks/error-messages/error-classes#delta_not_null_constraint_violated" target="_blank" rel="noopener noreferrer"&gt;DELTA_NOT_NULL_CONSTRAINT_VIOLATED&lt;/A&gt;] NOT NULL constraint violated for column: body.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;What gives? There is in fact also no &lt;A href="https://docs.databricks.com/aws/en/sql/language-manual/functions/is_variant_null" target="_self"&gt;is_variant_null&lt;/A&gt; matches either. Could it be that the NOT NULL constraint is violated in some earlier transaction?&lt;/P&gt;&lt;P&gt;I am not able to drop the constraint because this is a streaming table.&lt;/P&gt;</description>
    <pubDate>Thu, 02 Jul 2026 08:48:03 GMT</pubDate>
    <dc:creator>Malthe</dc:creator>
    <dc:date>2026-07-02T08:48:03Z</dc:date>
    <item>
      <title>NOT NULL constraint violated for column during OPTIMIZE</title>
      <link>https://community.databricks.com/t5/data-engineering/not-null-constraint-violated-for-column-during-optimize/m-p/161176#M54997</link>
      <description>&lt;P&gt;We're running an optimize on a delta table with a VARIANT column that has a NOT NULL constraint.&lt;/P&gt;&lt;P&gt;Now, there are no NULL entries in this column. And yet, OPTIMIZE gives this error:&lt;/P&gt;&lt;P&gt;&lt;SPAN class=""&gt;[&lt;A class="" href="https://learn.microsoft.com/azure/databricks/error-messages/error-classes#delta_not_null_constraint_violated" target="_blank" rel="noopener noreferrer"&gt;DELTA_NOT_NULL_CONSTRAINT_VIOLATED&lt;/A&gt;] NOT NULL constraint violated for column: body.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;What gives? There is in fact also no &lt;A href="https://docs.databricks.com/aws/en/sql/language-manual/functions/is_variant_null" target="_self"&gt;is_variant_null&lt;/A&gt; matches either. Could it be that the NOT NULL constraint is violated in some earlier transaction?&lt;/P&gt;&lt;P&gt;I am not able to drop the constraint because this is a streaming table.&lt;/P&gt;</description>
      <pubDate>Thu, 02 Jul 2026 08:48:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/not-null-constraint-violated-for-column-during-optimize/m-p/161176#M54997</guid>
      <dc:creator>Malthe</dc:creator>
      <dc:date>2026-07-02T08:48:03Z</dc:date>
    </item>
    <item>
      <title>Re: NOT NULL constraint violated for column during OPTIMIZE</title>
      <link>https://community.databricks.com/t5/data-engineering/not-null-constraint-violated-for-column-during-optimize/m-p/161211#M55001</link>
      <description>&lt;P&gt;Thanks for reporting this — and for the thorough investigation you've already done (confirming no&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;IS NULL&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;matches and no&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;is_variant_null&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;hits).&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;What's likely happening:&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;When OPTIMIZE compacts files, it re-validates constraints against the data in the underlying Parquet files. Even if your current&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;EM&gt;logical&lt;/EM&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;table state has no NULLs, it's possible that the physical files being compacted contain NULLs from an earlier transaction — for example, from a write that occurred before the constraint was added, or a failed/reverted write whose files are still present.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Diagnostic steps:&lt;/STRONG&gt;&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P&gt;&lt;STRONG&gt;Check if a full scan surfaces the NULLs:&lt;/STRONG&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class="language-sql"&gt;SELECT input_file_name(), body FROM &amp;lt;table&amp;gt; WHERE body IS NULL
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;If this returns results, you've found the offending files. If it doesn't, that's itself diagnostic — it would suggest the constraint is being checked at a metadata/rewrite level rather than against actual data, which would point toward a bug.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;&lt;STRONG&gt;Double-check the JSON null case:&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Since you've already checked&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;is_variant_null&lt;/CODE&gt;, this is likely a non-issue, but to be thorough:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class="language-sql"&gt;SELECT COUNT(*) FROM &amp;lt;table&amp;gt; WHERE body = parse_json('null')
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;A VARIANT holding JSON&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;null&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;is not a SQL NULL and shouldn't violate the constraint — but it's worth ruling out as a factor in the validation logic.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;&lt;STRONG&gt;Check transaction history:&lt;/STRONG&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class="language-sql"&gt;DESCRIBE HISTORY &amp;lt;table&amp;gt;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Look for writes that predate the NOT NULL constraint being added, or any failed/aborted operations that might have left orphaned files.&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&lt;STRONG&gt;Workarounds:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;P&gt;&lt;STRONG&gt;Clone and test:&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;Create a clone of the table and run OPTIMIZE on the clone. This isolates the issue with no risk to your streaming table and confirms whether it's file-specific.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class="language-sql"&gt;CREATE TABLE &amp;lt;table_clone&amp;gt; SHALLOW CLONE &amp;lt;table&amp;gt;;
OPTIMIZE &amp;lt;table_clone&amp;gt;;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;&lt;STRONG&gt;DLT full refresh:&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;If this is a SDP-managed streaming table, you could update the pipeline definition to remove the NOT NULL constraint and trigger a full refresh. Note: this reprocesses all source data, so it may be expensive depending on your pipeline's scale.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;&lt;STRONG&gt;Manual rewrite (more invasive):&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;If you need to keep the constraint and just need clean files, you could do a one-time rewrite:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class="language-sql"&gt;CREATE OR REPLACE TABLE &amp;lt;new_table&amp;gt; AS SELECT * FROM &amp;lt;old_table&amp;gt;;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;This forces re-validation and produces clean compacted files. It's more invasive than cloning but faster than a full SDP refresh in some cases.&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;If diagnostics come up clean:&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;If step 1 returns no rows (no NULLs anywhere in the physical files), this is likely a bug and I'd recommend filing a support ticket&lt;/P&gt;</description>
      <pubDate>Thu, 02 Jul 2026 13:30:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/not-null-constraint-violated-for-column-during-optimize/m-p/161211#M55001</guid>
      <dc:creator>yogeshsingh</dc:creator>
      <dc:date>2026-07-02T13:30:37Z</dc:date>
    </item>
    <item>
      <title>Re: NOT NULL constraint violated for column during OPTIMIZE</title>
      <link>https://community.databricks.com/t5/data-engineering/not-null-constraint-violated-for-column-during-optimize/m-p/161225#M55005</link>
      <description>&lt;P&gt;So funny story on this: the column in question does not match on "col is null", but it does match on "col::string is null".&lt;/P&gt;&lt;P&gt;Perhaps because in the Parquet-file which corresponds to the rows that have the problem, that column is entirely missing.&lt;/P&gt;&lt;P&gt;Still, this seems to be a bug.&lt;/P&gt;</description>
      <pubDate>Thu, 02 Jul 2026 16:25:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/not-null-constraint-violated-for-column-during-optimize/m-p/161225#M55005</guid>
      <dc:creator>Malthe</dc:creator>
      <dc:date>2026-07-02T16:25:43Z</dc:date>
    </item>
  </channel>
</rss>

