State store configuration with applyInPandasWithState for optimal performance
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-15-2024 10:20 PM
Hello,
We are using a stateful pipeline for data processing and analytics. For state store, we are using applyInPandasWithState function however the state needs to be persistent across node restarts etc.
At this point, we are not sure how the state can be made persistent with applyInPandasWithState. There are some articles where it is mentioned around usage of RocksDB state store for persistence
Couple of questions:
1. What configurations is required to enable RocksDB state storage with applyInPandasWithState ?
2. What are the tuning parameters for RocksDB state store that can be tuned to provide optimal performance?
Any guidance around these would be appreciated.