Databricks has recently changed how unresolved column references are handled in PySpark on All-purpose compute clusters. In earlier Databricks Runtime (DBR) 14.3 LTS builds, referencing a non-existent columnโsuch as:
df = spark.table('default.example').select('nonexistent_column')
would immediately throw an exception (e.g., [UNRESOLVED_COLUMN.WITH_SUGGESTION]) when the .select() operation is called, even before lazy actions like .show() or .display() are triggered. This was valuable for catching issues early in job scripts, especially for schema validation and rapid typo detection.
Changes in Databricks Runtime Behavior (14.3 LTS โ 16.4 LTS)
Starting around April 2025, many users have observed that DBR 14.3 LTS stopped raising these exceptions right awayโerrors are only surfaced when an action is performed, resulting in deferred error reporting. Users found this problematic, as typos or schema mismatches are discovered late in the workflow.
In DBR 16.4 LTS, the behavior changed again: instead of an immediate Python exception, a detailed JSON-formatted error message is logged to the console as soon as the unresolved column reference is made, even before a lazy action. The exception itself, and error propagation, still happens only on actual action execution. This improvement makes errors more visible up front, but does not fully restore the previous eager exception behavior.
Is This Logging Behavior Intentional?
There is no mention of this behavioral change in the official 16.4 LTS release notes, nor in the corresponding PySpark documentation updates as of November 2025. Discussions in Databricks forums and Stack Overflow speculate that this is likely an intentional change to improve visibility for notebook users, especially in interactive settings, but Databricks has not officially documented it.
Related Observations:
-
The change seems limited to console logging rather than exception propagation.
-
It is observed on All-purpose compute clusters, while Serverless clusters might show different behavior.
-
DBR 15.x LTS behaves similarly to 14.3 LTS (errors deferred).
Configurations: Can You Restore Eager Exception Behavior?
There is no documented configuration or Spark/Databricks setting to restore the precise โeager error-throwingโ behavior from earlier 14.3 LTS builds as of November 2025.
-
Spark configuration keys (spark.sql.*, etc.) do not control this aspect of unresolved column reference error-timing.
-
No Databricks workspaceโlevel or clusterโlevel configuration restores pre-action validation to raise exceptions during transformations.
-
Workarounds include manually checking column existence before selects, or using helper utilities, but not via official config.
-
This behavior is mostly governed by Databricksโ custom patches to PySpark runtime, not by upstream Spark alone.
Recommendations & Workarounds
-
Manual Schema Checks: To get eager validation, check for column existence before the .select() call:
if 'nonexistent_column' not in df.columns:
raise ValueError("Column does not exist")
-
Monitor Release Notes: If true eager errors are essential, consider lobbying Databricks support for feature restoration.
-
Watch for Future Runtime Updates: This aspect may change again in future runtime revisions.
Summary Table
| Runtime Version |
Eager Exception |
Deferred Error |
Console Logging |
Config Option |
| 14.3 LTS (pre-Apr) |
Yes |
No |
Minimal |
No |
| 14.3 LTS (post-Apr) |
No |
Yes |
Minimal |
No |
| 15.4 LTS |
No |
Yes |
Minimal |
No |
| 16.4 LTS |
No |
Yes |
JSON error |
No |
In summary:
Databricksโ new logging behavior in 16.4 LTS is not fully documented, seems intentionally added for visibility, but there is no configuration to restore the original eager error-throwing behavior. For production or robust validation, manual schema checks remain the best workaround.