Saritha_S
Databricks Employee
Databricks Employee

Hi @lprevost 

Good day!!

Please find below my analysis for your issue. 

Error:

[STREAM_FAILED] Query [id = 6a821fbc-490b-4ad8-891d-e4cacc2af1d6, runId = e055fede-8012-4369-861b-47183999e91d] terminated with exception: [STREAMING_STATEFUL_OPERATOR_NOT_MATCH_IN_STATE_METADATA] Streaming stateful operator name does not match with the operator in state metadata. This likely to happen when user adds/removes/changes stateful operator of existing streaming query. Stateful operators in the metadata: [(OperatorId: 0 -> OperatorName: dedupe)]; Stateful operators in current batch: []. SQLSTATE: 42K03 SQLSTATE: XXKST

Root cause:
The STREAMING_STATEFUL_OPERATOR_NOT_MATCH_IN_STATE_METADATA error occurred as expected due to removal of stateful operators like drop_duplicates.

Solution and recommendation
The only recommended and reliable solution is to restart the streaming query with a new checkpoint. It provides a clean slate for the query, preventing unforeseen complications caused by mismatched or corrupted states.