Databricks Community

sajith_appukutt · ‎06-09-2021

Though the data volume is relatively even, the streaming aggregation query is showing highly variable micro-batch processing times

sajith_appukutt · ‎06-17-2021

By default, the state data (streaming aggregation query) is maintained in the JVM memory of the executors and large number of state objects could put memory pressure on the JVM causing high GC pauses. If you have stateful operations in your streaming query, it is recommended to use a more optimized state management solution based on RocksDB.

More details at https://docs.databricks.com/spark/latest/structured-streaming/production.html#optimize-performance-o...

View solution in original post

sajith_appukutt · ‎06-17-2021

By default, the state data (streaming aggregation query) is maintained in the JVM memory of the executors and large number of state objects could put memory pressure on the JVM causing high GC pauses. If you have stateful operations in your streaming query, it is recommended to use a more optimized state management solution based on RocksDB.

More details at https://docs.databricks.com/spark/latest/structured-streaming/production.html#optimize-performance-o...

Databricks Community

I have a streaming aggregation query with highly variable micro-batch processing times. Seeing a lot of GC pauses in the logs . Any pointers on how to debug ?

Connect with Databricks Users in Your Area

Submit your feedback and win a $50 gift card!

Share Your Feedback in Our Community Survey

Databricks Named a Leader in the 2024 Gartner® Magic Quadrant™ for Cloud Database Management Systems

Announcing the new Meta Llama 3.3 model on Databricks

Milestone: DatabricksTV Reaches 100 Videos!