Hi @bi_123,
Thanks for checking that, and yes, I think your observation is reasonable. The general Databricks error handling docs say that in Python, you can use PySparkException.getErrorClass(), getSqlState(), and getMessageParameters() to programmatically inspect an error, but that guidance applies when the Python-side exception actually carries those structured fields. In the structured streaming case, the public docs for StreamingQuery.awaitTermination() and StreamingQuery.exception() only say that the query terminates because of an exception and that you get back a StreamingQueryException; they do not say that this wrapper will preserve the underlying JVM error class, SQLSTATE, or message parameters.
So if ex.getErrorClass(), ex.getSqlState(), and ex.getMessageParameters() are all None on the Python StreamingQueryException, I would not treat that as evidence that the original JVM exception had no structured metadata. It is more likely that by the time the failure is surfaced through the streaming wrapper on the Python side, those fields are no longer reliably available there. In other words, StreamingQueryException is useful for telling you that the stream failed, but not necessarily for exposing the full structured error payload of the original cause. That seems consistent with what you are seeing.
Given that, I would suggest not making the Python StreamingQueryException your primary source of truth for schema evolution handling. For Auto Loader, the more reliable signal is still the behaviour documented in the schema inference and evolution docs...when Auto Loader detects a new column, it stops with an UnknownFieldException, but before failing it updates the schema stored under cloudFiles.schemaLocation with the merged schema for that micro-batch. That means that if your goal is robust logging, a better approach is to log the stream metadata you do have in Python, such as query id, run id, stream name, and the exception string for human context, and then derive the actual schema change by diffing the previous and latest schema snapshots in cloudFiles.schemaLocation.
So my short answer would be...yes, this can very well be due to how the failure is surfaced through the streaming wrapper into Python, and if the structured fields are None there, I would fall back to treating the Python exception as a notification that the stream failed, while using cloudFiles.schemaLocation as the canonical source for what actually changed. That ends up being more stable than parsing the rendered message, and in practice it is usually more useful for future debugging as well.
If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.
Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***