For a data processing pipeline I use structured streaming and arbitrary stateful processing. I was wondering if the partitioning over several worker nodes and thus updating the state from different worker nodes has to be considered (e.g. using a lock) when using applyInPandasWithState. Or is that handled automatically by PySpark and Databricks and abstracted away?
Thank you