cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

NOT NULL constraint violated for column during OPTIMIZE

Malthe
Valued Contributor II

We're running an optimize on a delta table with a VARIANT column that has a NOT NULL constraint.

Now, there are no NULL entries in this column. And yet, OPTIMIZE gives this error:

[DELTA_NOT_NULL_CONSTRAINT_VIOLATED] NOT NULL constraint violated for column: body.

What gives? There is in fact also no is_variant_null matches either. Could it be that the NOT NULL constraint is violated in some earlier transaction?

I am not able to drop the constraint because this is a streaming table.

1 REPLY 1

yogeshsingh
Databricks Employee
Databricks Employee

Thanks for reporting this — and for the thorough investigation you've already done (confirming no IS NULL matches and no is_variant_null hits).

What's likely happening:

When OPTIMIZE compacts files, it re-validates constraints against the data in the underlying Parquet files. Even if your current logical table state has no NULLs, it's possible that the physical files being compacted contain NULLs from an earlier transaction — for example, from a write that occurred before the constraint was added, or a failed/reverted write whose files are still present.

Diagnostic steps:

  1. Check if a full scan surfaces the NULLs:

    SELECT input_file_name(), body FROM <table> WHERE body IS NULL
    

    If this returns results, you've found the offending files. If it doesn't, that's itself diagnostic — it would suggest the constraint is being checked at a metadata/rewrite level rather than against actual data, which would point toward a bug.

  2. Double-check the JSON null case:

    Since you've already checked is_variant_null, this is likely a non-issue, but to be thorough:

    SELECT COUNT(*) FROM <table> WHERE body = parse_json('null')
    

    A VARIANT holding JSON null is not a SQL NULL and shouldn't violate the constraint — but it's worth ruling out as a factor in the validation logic.

  3. Check transaction history:

    DESCRIBE HISTORY <table>
    

    Look for writes that predate the NOT NULL constraint being added, or any failed/aborted operations that might have left orphaned files.

Workarounds:

  • Clone and test: Create a clone of the table and run OPTIMIZE on the clone. This isolates the issue with no risk to your streaming table and confirms whether it's file-specific.

    CREATE TABLE <table_clone> SHALLOW CLONE <table>;
    OPTIMIZE <table_clone>;
    
  • DLT full refresh: If this is a SDP-managed streaming table, you could update the pipeline definition to remove the NOT NULL constraint and trigger a full refresh. Note: this reprocesses all source data, so it may be expensive depending on your pipeline's scale.

  • Manual rewrite (more invasive): If you need to keep the constraint and just need clean files, you could do a one-time rewrite:

    CREATE OR REPLACE TABLE <new_table> AS SELECT * FROM <old_table>;
    

    This forces re-validation and produces clean compacted files. It's more invasive than cloning but faster than a full SDP refresh in some cases.

If diagnostics come up clean:

If step 1 returns no rows (no NULLs anywhere in the physical files), this is likely a bug and I'd recommend filing a support ticket