Concurrent State Update from Worker Nodes Possible?
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-18-2024 06:42 AM
For a data processing pipeline I use structured streaming and arbitrary stateful processing. I was wondering if the partitioning over several worker nodes and thus updating the state from different worker nodes has to be considered (e.g. using a lock) when using applyInPandasWithState. Or is that handled automatically by PySpark and Databricks and abstracted away?
Thank you