Setting right processingTime
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-03-2023 02:30 AM
How to set just the right processingTime for readStream to maximize the performance? Based on which factors it depends and is there a way to measure this?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-03-2023 04:17 AM
When you specify a trigger interval that is too small (less than tens of seconds), the system may perform unnecessary checks to see if new data arrives. Configure your processing time to balance latency requirements and the rate that data arrives in the source.
There is no specific measure for this because it totally depend on your use cases.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-03-2023 10:00 AM
Reference doc - https://docs.databricks.com/structured-streaming/triggers.html
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-03-2023 10:09 PM
Hi @Ognjen Grubac
Thank you for posting your question in our community! We are happy to assist you.
To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?
This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-20-2023 03:56 AM
Thanks @Ajay Pandey and @Nandini N for your answers. I wanted to know more about what should I do in order to do it properly. Should I change processing times (1, 5, 10, 30, 60 seconds) and see how it affects running job in terms of time and CPU/memory used? Or is there a fine way how to do it?