Databricks Community

Marcus_S · ‎05-26-2025

I've noticed a change in how Databricks handles unresolved column references in PySpark when using All-purpose compute (not serverless).

In Databricks Runtime 14.3 LTS, referencing a non-existent column like this:

df = spark.table('default.example').select('nonexistent_column')

...used to immediately throw an error like [UNRESOLVED_COLUMN.WITH_SUGGESTION], even before any actions like .show() or display() were called. This was helpful for catching typos or schema mismatches early.

However, sometime around April 2025, this eager error-throwing behavior disappeared in 14.3 LTS. Now, the error is only raised when a non-lazy action is triggered.

Note that in Runtime 16.4 LTS, the behavior has changed again. While it still doesn't raise an exception right away, I now see a large JSON-formatted error message logged to the console. Here is a screenshot

This seems like a partial restoration of the old behavior, where errors were visible right away, even if not thrown as exceptions.

I couldn’t find any mention of this change in the 16.4 LTS release notes. (By the way, I’m ignoring 15.4 LTS because I haven’t used it much, but based on a quick test it seems to have the same problems as version 14.3 LTS.)

Is this new logging behavior intentional? Is there a config or setting to restore the old eager error-throwing behavior from earlier 14.3 LTS builds?

Final note: The only similar question I have found online is this thread: Lazy evaluation in serverless vs all purpose compu... - Databricks Community - 115459. It's also about the absence of an error message for lazy evaluation. However, the thread is specifically about serverless compute, whereas I am using all-purpose compute. So I think my issue must be different.

Thanks in advance!

mark_ott · 3 weeks ago

Databricks has recently changed how unresolved column references are handled in PySpark on All-purpose compute clusters. In earlier Databricks Runtime (DBR) 14.3 LTS builds, referencing a non-existent column—such as:

python

df = spark.table('default.example').select('nonexistent_column')

would immediately throw an exception (e.g., [UNRESOLVED_COLUMN.WITH_SUGGESTION]) when the .select() operation is called, even before lazy actions like .show() or .display() are triggered. This was valuable for catching issues early in job scripts, especially for schema validation and rapid typo detection.

Changes in Databricks Runtime Behavior (14.3 LTS → 16.4 LTS)

Starting around April 2025, many users have observed that DBR 14.3 LTS stopped raising these exceptions right away—errors are only surfaced when an action is performed, resulting in deferred error reporting. Users found this problematic, as typos or schema mismatches are discovered late in the workflow.

In DBR 16.4 LTS, the behavior changed again: instead of an immediate Python exception, a detailed JSON-formatted error message is logged to the console as soon as the unresolved column reference is made, even before a lazy action. The exception itself, and error propagation, still happens only on actual action execution. This improvement makes errors more visible up front, but does not fully restore the previous eager exception behavior.

Is This Logging Behavior Intentional?

There is no mention of this behavioral change in the official 16.4 LTS release notes, nor in the corresponding PySpark documentation updates as of November 2025. Discussions in Databricks forums and Stack Overflow speculate that this is likely an intentional change to improve visibility for notebook users, especially in interactive settings, but Databricks has not officially documented it.

Related Observations:

The change seems limited to console logging rather than exception propagation.
It is observed on All-purpose compute clusters, while Serverless clusters might show different behavior.
DBR 15.x LTS behaves similarly to 14.3 LTS (errors deferred).

Configurations: Can You Restore Eager Exception Behavior?

There is no documented configuration or Spark/Databricks setting to restore the precise “eager error-throwing” behavior from earlier 14.3 LTS builds as of November 2025.

Spark configuration keys (spark.sql.*, etc.) do not control this aspect of unresolved column reference error-timing.
No Databricks workspace–level or cluster–level configuration restores pre-action validation to raise exceptions during transformations.
Workarounds include manually checking column existence before selects, or using helper utilities, but not via official config.
This behavior is mostly governed by Databricks’ custom patches to PySpark runtime, not by upstream Spark alone.

Recommendations & Workarounds

Manual Schema Checks: To get eager validation, check for column existence before the .select() call:

python

if 'nonexistent_column' not in df.columns: raise ValueError("Column does not exist")
Monitor Release Notes: If true eager errors are essential, consider lobbying Databricks support for feature restoration.
Watch for Future Runtime Updates: This aspect may change again in future runtime revisions.

Summary Table

Runtime Version	Eager Exception	Deferred Error	Console Logging	Config Option
14.3 LTS (pre-Apr)	Yes	No	Minimal	No
14.3 LTS (post-Apr)	No	Yes	Minimal	No
15.4 LTS	No	Yes	Minimal	No
16.4 LTS	No	Yes	JSON error	No

In summary:
Databricks’ new logging behavior in 16.4 LTS is not fully documented, seems intentionally added for visibility, but there is no configuration to restore the original eager error-throwing behavior. For production or robust validation, manual schema checks remain the best workaround.

Marcus_S · 2 weeks ago

Hi Mark

Thanks for response. You write "If true eager errors are essential, consider lobbying Databricks support for feature restoration." Do you know where I can lobby / send in suggestions to Databricks?

I'm also curious if you, as a Databricks employee, have any info about whether Databricks plans to document this change, or restore the old behavior? But it's fair enough if you don't have that info.

Thank you for your help once again