Databricks Community

lprevost · ‎06-06-2025

[STREAM_FAILED] Query [id = 6a821fbc-490b-4ad8-891d-e4cacc2af1d6, runId = e055fede-8012-4369-861b-47183999e91d] terminated with exception: [STREAMING_STATEFUL_OPERATOR_NOT_MATCH_IN_STATE_METADATA] Streaming stateful operator name does not match with the operator in state metadata. This likely to happen when user adds/removes/changes stateful operator of existing streaming query. Stateful operators in the metadata: [(OperatorId: 0 -> OperatorName: dedupe)]; Stateful operators in current batch: []. SQLSTATE: 42K03 SQLSTATE: XXKST

I'm getting a streaming query error on a previously successful query on my latest run. I think this may have been introduced because I added some "isEmpty" logic to make the query fail gracefully if there is no incremental data to merge. I say this because of this part of the error:

Stateful operators in the metadata: [(OperatorId: 0 -> OperatorName: dedupe)]; Stateful operators in current batch: [].

My forEachBatch code:

def host_transform(df😞

return (df

.selectExpr("rurl as host_rurl")

.dropDuplicates(["host_rurl"])

.withColumn("domain_rurl", extract.reverse(extract.sled_domain_extractor("host_rurl", reverse_url = True)))

.withColumn("tags", F.array().cast(T.ArrayType(T.StringType(), True)))

)

def upsertToDelta(batchdf, batchId😞

# Builds both host and domain unique rurls master tables with an autogenerated id long int id as PK

if not batchdf.isEmpty():

batchdf = host_transform(batchdf)

insert_values = {k:f"s.{k}" for k in batchdf.columns}

print(f"Upsert being performed with batch {batchId} on {targetTableName} with target columns as {batchdf.columns}.")

# targetTable and targetTableName must be set before each upsert

(targetTable.alias("t").merge(

batchdf.alias("s"),

"s.host_rurl = t.host_rurl")

.whenNotMatchedInsert(values= insert_values) #all but id which is autogenerated

.execute()

)

else:

print(f"New batch {batchId} is empty.")

Saritha_S · ‎06-11-2025

Hi @lprevost

Good day!!

Please find below my analysis for your issue.

Error:

[STREAM_FAILED] Query [id = 6a821fbc-490b-4ad8-891d-e4cacc2af1d6, runId = e055fede-8012-4369-861b-47183999e91d] terminated with exception: [STREAMING_STATEFUL_OPERATOR_NOT_MATCH_IN_STATE_METADATA] Streaming stateful operator name does not match with the operator in state metadata. This likely to happen when user adds/removes/changes stateful operator of existing streaming query. Stateful operators in the metadata: [(OperatorId: 0 -> OperatorName: dedupe)]; Stateful operators in current batch: []. SQLSTATE: 42K03 SQLSTATE: XXKST

Root cause:
The STREAMING_STATEFUL_OPERATOR_NOT_MATCH_IN_STATE_METADATA error occurred as expected due to removal of stateful operators like drop_duplicates.

Solution and recommendation
The only recommended and reliable solution is to restart the streaming query with a new checkpoint. It provides a clean slate for the query, preventing unforeseen complications caused by mismatched or corrupted states.

Databricks Community

Streaming query error - [STREAMING_STATEFUL_OPERATOR_NOT_MATCH_IN_STATE_METADATA]

Join Us as a Local Community Builder!

🎬 Databricks Community 2025 Highlights | A Year, Built Together

🌟 Community Pulse: Your Weekly Roundup! December 22, 2025 – January 04, 2026

Solution Accelerator Series | Scale cybersecurity analytics with Splunk and Databricks

🎤 Call for Presentations: Data + AI Summit 2026 is Open!

Self-Paced Learning Festival: 09 January - 30 January 2026