Databricks Community

bi_123 · 4 weeks ago

Hi,

When schema evolution is detected, Auto Loader throws an UNKNOWN_FIELD_EXCEPTION, and the error message includes schema information along with other related details. However, when I log the full message, it is too long and contains information that can make debugging more confusing.

What are the best practices for logging schema evolution exceptions so that the logs contain meaningful information for future debugging?

I initially tried parsing the message using an identifier, because I thought the chosen phrase would always be present. However, I later found that this is not reliable. The exception message varies depending on the type of schema evolution, such as a new field, type widening, or other schema changes.

Because of this, the current parsing approach is not robust. What would be a better way to extract and log the most useful information from these schema evolution exceptions?

Ashwin_DSA · 3 weeks ago

Hi @bi_123,

Thanks for checking that, and yes, I think your observation is reasonable. The general Databricks error handling docs say that in Python, you can use PySparkException.getErrorClass(), getSqlState(), and getMessageParameters() to programmatically inspect an error, but that guidance applies when the Python-side exception actually carries those structured fields. In the structured streaming case, the public docs for StreamingQuery.awaitTermination() and StreamingQuery.exception() only say that the query terminates because of an exception and that you get back a StreamingQueryException; they do not say that this wrapper will preserve the underlying JVM error class, SQLSTATE, or message parameters.

So if ex.getErrorClass(), ex.getSqlState(), and ex.getMessageParameters() are all None on the Python StreamingQueryException, I would not treat that as evidence that the original JVM exception had no structured metadata. It is more likely that by the time the failure is surfaced through the streaming wrapper on the Python side, those fields are no longer reliably available there. In other words, StreamingQueryException is useful for telling you that the stream failed, but not necessarily for exposing the full structured error payload of the original cause. That seems consistent with what you are seeing.

Given that, I would suggest not making the Python StreamingQueryException your primary source of truth for schema evolution handling. For Auto Loader, the more reliable signal is still the behaviour documented in the schema inference and evolution docs...when Auto Loader detects a new column, it stops with an UnknownFieldException, but before failing it updates the schema stored under cloudFiles.schemaLocation with the merged schema for that micro-batch. That means that if your goal is robust logging, a better approach is to log the stream metadata you do have in Python, such as query id, run id, stream name, and the exception string for human context, and then derive the actual schema change by diffing the previous and latest schema snapshots in cloudFiles.schemaLocation.

So my short answer would be...yes, this can very well be due to how the failure is surfaced through the streaming wrapper into Python, and if the structured fields are None there, I would fall back to treating the Python exception as a notification that the stream failed, while using cloudFiles.schemaLocation as the canonical source for what actually changed. That ends up being more stable than parsing the rendered message, and in practice it is usually more useful for future debugging as well.

If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***

View solution in original post

Ashwin_DSA · 4 weeks ago

Hi @bi_123,

I would avoid parsing the full rendered UNKNOWN_FIELD_EXCEPTION message. Databricks explicitly notes in the error-handling documentation that the rendered and parameterised messages are not stable across releases, so any logic that depends on a specific phrase being present can break as the wording changes. A more robust approach is to handle the exception using the structured fields exposed by Spark or PySpark, such as getErrorClass(), getSqlState(), and getMessageParameters(), and log those values instead of trying to slice up str(e).

For Auto Loader specifically, the schema inference and evolution documentation explains that when a new column is detected, the stream stops with an UnknownFieldException, but before it fails, Auto Loader updates the schema stored under cloudFiles.schemaLocation. In practice, that means the most useful thing to log for future debugging is usually not the full exception text, but a compact summary that includes the error class, SQLSTATE, message parameters, the stream or query identifiers, the relevant source path if it is available, and the schema location or a schema diff from the latest schema snapshot.

So my recommendation would be to treat the rendered exception text as human-readable context only, ideally truncated, and rely on structured exception metadata plus the schema state in cloudFiles.schemaLocation for anything programmatic. That approach is much more resilient across cases like new fields, type widening, and other schema changes, and it keeps the logs focused on the details that are actually useful when someone needs to debug the issue later. The same Auto Loader documentation also covers the different schema evolution modes, including addNewColumns, addNewColumnsWithTypeWidening, and rescue, which is another reason not to assume a single message shape will always apply across every schema change scenario.

If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***

bi_123 · 3 weeks ago

As far as I know, in PySpark, exceptions raised in the JVM side are wrapped and sent to the Python driver, where they appear as PySparkException (or its subclasses). And during this conversion, some information might be lost. I tried catching the exception using StreamingQueryException, since the documentation says that awaitTermination() throws this exception. The error is caught; however, when I try to access ex.getErrorClass(), ex.getSqlState(), and ex.getMessageParameters(), they are all None. Could this be related to the exception casting from PySpark to Python that I mentioned above? If so, what are my other options here?

Ashwin_DSA · 3 weeks ago

Hi @bi_123,

Thanks for checking that, and yes, I think your observation is reasonable. The general Databricks error handling docs say that in Python, you can use PySparkException.getErrorClass(), getSqlState(), and getMessageParameters() to programmatically inspect an error, but that guidance applies when the Python-side exception actually carries those structured fields. In the structured streaming case, the public docs for StreamingQuery.awaitTermination() and StreamingQuery.exception() only say that the query terminates because of an exception and that you get back a StreamingQueryException; they do not say that this wrapper will preserve the underlying JVM error class, SQLSTATE, or message parameters.

So if ex.getErrorClass(), ex.getSqlState(), and ex.getMessageParameters() are all None on the Python StreamingQueryException, I would not treat that as evidence that the original JVM exception had no structured metadata. It is more likely that by the time the failure is surfaced through the streaming wrapper on the Python side, those fields are no longer reliably available there. In other words, StreamingQueryException is useful for telling you that the stream failed, but not necessarily for exposing the full structured error payload of the original cause. That seems consistent with what you are seeing.

Given that, I would suggest not making the Python StreamingQueryException your primary source of truth for schema evolution handling. For Auto Loader, the more reliable signal is still the behaviour documented in the schema inference and evolution docs...when Auto Loader detects a new column, it stops with an UnknownFieldException, but before failing it updates the schema stored under cloudFiles.schemaLocation with the merged schema for that micro-batch. That means that if your goal is robust logging, a better approach is to log the stream metadata you do have in Python, such as query id, run id, stream name, and the exception string for human context, and then derive the actual schema change by diffing the previous and latest schema snapshots in cloudFiles.schemaLocation.

So my short answer would be...yes, this can very well be due to how the failure is surfaced through the streaming wrapper into Python, and if the structured fields are None there, I would fall back to treating the Python exception as a notification that the stream failed, while using cloudFiles.schemaLocation as the canonical source for what actually changed. That ends up being more stable than parsing the rendered message, and in practice it is usually more useful for future debugging as well.

If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***