cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Change in UNRESOLVED_COLUMN error behavior in Runtime 14.3 LTS

Marcus_S
New Contributor

I've noticed a change in how Databricks handles unresolved column references in PySpark when using All-purpose compute (not serverless).

In Databricks Runtime 14.3 LTS, referencing a non-existent column like this:

df = spark.table('default.example').select('nonexistent_column')

...used to immediately throw an error like [UNRESOLVED_COLUMN.WITH_SUGGESTION], even before any actions like .show() or display() were called. This was helpful for catching typos or schema mismatches early.

However, sometime around April 2025, this eager error-throwing behavior disappeared in 14.3 LTS. Now, the error is only raised when a non-lazy action is triggered.

Note that in Runtime 16.4 LTS, the behavior has changed again. While it still doesn't raise an exception right away, I now see a large JSON-formatted error message logged to the console. Here is a screenshot

Marcus_S_2-1748270966823.png

This seems like a partial restoration of the old behavior, where errors were visible right away, even if not thrown as exceptions.

I couldnโ€™t find any mention of this change in the 16.4 LTS release notes. (By the way, Iโ€™m ignoring 15.4 LTS because I havenโ€™t used it much, but based on a quick test it seems to have the same problems as version 14.3 LTS.)

Is this new logging behavior intentional? Is there a config or setting to restore the old eager error-throwing behavior from earlier 14.3 LTS builds?

Final note: The only similar question I have found online is this thread: Lazy evaluation in serverless vs all purpose compu... - Databricks Community - 115459. It's also about the absence of an error message for lazy evaluation. However, the thread is specifically about serverless compute, whereas I am using all-purpose compute. So I think my issue must be different.

Thanks in advance!

1 REPLY 1

mark_ott
Databricks Employee
Databricks Employee

Databricks has recently changed how unresolved column references are handled in PySpark on All-purpose compute clusters. In earlier Databricks Runtime (DBR) 14.3 LTS builds, referencing a non-existent columnโ€”such as:

python
df = spark.table('default.example').select('nonexistent_column')

would immediately throw an exception (e.g., [UNRESOLVED_COLUMN.WITH_SUGGESTION]) when the .select() operation is called, even before lazy actions like .show() or .display() are triggered. This was valuable for catching issues early in job scripts, especially for schema validation and rapid typo detection.

Changes in Databricks Runtime Behavior (14.3 LTS โ†’ 16.4 LTS)

Starting around April 2025, many users have observed that DBR 14.3 LTS stopped raising these exceptions right awayโ€”errors are only surfaced when an action is performed, resulting in deferred error reporting. Users found this problematic, as typos or schema mismatches are discovered late in the workflow.

In DBR 16.4 LTS, the behavior changed again: instead of an immediate Python exception, a detailed JSON-formatted error message is logged to the console as soon as the unresolved column reference is made, even before a lazy action. The exception itself, and error propagation, still happens only on actual action execution. This improvement makes errors more visible up front, but does not fully restore the previous eager exception behavior.

Is This Logging Behavior Intentional?

There is no mention of this behavioral change in the official 16.4 LTS release notes, nor in the corresponding PySpark documentation updates as of November 2025. Discussions in Databricks forums and Stack Overflow speculate that this is likely an intentional change to improve visibility for notebook users, especially in interactive settings, but Databricks has not officially documented it.

Related Observations:

  • The change seems limited to console logging rather than exception propagation.

  • It is observed on All-purpose compute clusters, while Serverless clusters might show different behavior.

  • DBR 15.x LTS behaves similarly to 14.3 LTS (errors deferred).

Configurations: Can You Restore Eager Exception Behavior?

There is no documented configuration or Spark/Databricks setting to restore the precise โ€œeager error-throwingโ€ behavior from earlier 14.3 LTS builds as of November 2025.

  • Spark configuration keys (spark.sql.*, etc.) do not control this aspect of unresolved column reference error-timing.

  • No Databricks workspaceโ€“level or clusterโ€“level configuration restores pre-action validation to raise exceptions during transformations.

  • Workarounds include manually checking column existence before selects, or using helper utilities, but not via official config.

  • This behavior is mostly governed by Databricksโ€™ custom patches to PySpark runtime, not by upstream Spark alone.

Recommendations & Workarounds

  • Manual Schema Checks: To get eager validation, check for column existence before the .select() call:

    python
    if 'nonexistent_column' not in df.columns: raise ValueError("Column does not exist")
  • Monitor Release Notes: If true eager errors are essential, consider lobbying Databricks support for feature restoration.

  • Watch for Future Runtime Updates: This aspect may change again in future runtime revisions.

Summary Table

Runtime Version Eager Exception Deferred Error Console Logging Config Option
14.3 LTS (pre-Apr) Yes No Minimal No
14.3 LTS (post-Apr) No Yes Minimal No
15.4 LTS No Yes Minimal No
16.4 LTS No Yes JSON error No
 
 

In summary:
Databricksโ€™ new logging behavior in 16.4 LTS is not fully documented, seems intentionally added for visibility, but there is no configuration to restore the original eager error-throwing behavior. For production or robust validation, manual schema checks remain the best workaround.