cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Chaining stateful Operator

Maatari
New Contributor III

I would like to do a groupby followed by a join in structured streaming. I would read from from two delta table in snapshot mode i.e. latest snapshot.

My question is specifically about chaining the stateful operator. 

groupby is update mode

chaning groupby and join, must be append mode overall. 

But does it means that the groupby would behave as if it was append as well, or the groupby can be in update mode and the join in append mode ? 

 

 

1 REPLY 1

mark_ott
Databricks Employee
Databricks Employee

When chaining stateful operators like groupBy (aggregation) and join in Spark Structured Streaming, there are specific rules about the output mode required for the overall query and the behavior of each operator.

Output Mode Requirements

  • The groupBy operator (stateful aggregation) supports update and complete output modes when used alone because it may update existing aggregated values as new data arrives.โ€‹

  • The join between two streaming DataFrames must use append output mode overall, meaning only newly joined rows are emitted downstream.โ€‹

Behavior When Chaining Operators

  • When you chain a groupBy (update mode) followed by a streaming join, the overall query is required to run in append mode because joins in Structured Streaming only support append output.โ€‹

  • This does not mean that the groupBy operator itself shifts to append mode internally. The aggregation still behaves like an update aggregation: it maintains state and recalculates aggregates as new data arrives. However, Spark will output only the newly joined records, not updated aggregations, downstreamโ€”effectively discarding any updated rows not resulting in a new join.โ€‹

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now