Resolved! Concurrent State Update from Worker Nodes Possible?
For a data processing pipeline I use structured streaming and arbitrary stateful processing. I was wondering if the partitioning over several worker nodes and thus updating the state from different worker nodes has to be considered (e.g. using a lock...
- 871 Views
- 1 replies
- 0 kudos
Latest Reply
Hi @fperry, When using applyInPandasWithState in PySpark, updates to each group’s state are automatically saved across invocations1. The function you provide should take parameters (key, Iterator[pandas.DataFrame], state) and return another Iterator[...
- 0 kudos