<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: I have a streaming aggregation query with highly variable  micro-batch processing times.  Seeing a lot of GC pauses in the logs . Any pointers on how to debug ? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/i-have-a-streaming-aggregation-query-with-highly-variable-micro/m-p/25454#M17695</link>
    <description>&lt;P&gt;By default, the state data (streaming aggregation query) is maintained in the JVM memory of the executors and large number of state objects could put  memory pressure on the JVM causing high GC pauses. If you have stateful operations in your streaming query, it is recommended to use a more optimized state management solution based on&amp;nbsp;&lt;A href="https://rocksdb.org/" alt="https://rocksdb.org/" target="_blank"&gt;RocksDB&lt;/A&gt;.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;More details at &lt;A href="https://docs.databricks.com/spark/latest/structured-streaming/production.html#optimize-performance-of-stateful-streaming-queries" target="test_blank"&gt;https://docs.databricks.com/spark/latest/structured-streaming/production.html#optimize-performance-of-stateful-streaming-queries&lt;/A&gt; &lt;/P&gt;</description>
    <pubDate>Thu, 17 Jun 2021 23:14:58 GMT</pubDate>
    <dc:creator>sajith_appukutt</dc:creator>
    <dc:date>2021-06-17T23:14:58Z</dc:date>
    <item>
      <title>I have a streaming aggregation query with highly variable  micro-batch processing times.  Seeing a lot of GC pauses in the logs . Any pointers on how to debug ?</title>
      <link>https://community.databricks.com/t5/data-engineering/i-have-a-streaming-aggregation-query-with-highly-variable-micro/m-p/25453#M17694</link>
      <description>&lt;P&gt;Though the data volume is relatively even, the &amp;nbsp;streaming aggregation query is showing highly variable&amp;nbsp;micro-batch processing times&lt;/P&gt;</description>
      <pubDate>Wed, 09 Jun 2021 08:20:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/i-have-a-streaming-aggregation-query-with-highly-variable-micro/m-p/25453#M17694</guid>
      <dc:creator>sajith_appukutt</dc:creator>
      <dc:date>2021-06-09T08:20:06Z</dc:date>
    </item>
    <item>
      <title>Re: I have a streaming aggregation query with highly variable  micro-batch processing times.  Seeing a lot of GC pauses in the logs . Any pointers on how to debug ?</title>
      <link>https://community.databricks.com/t5/data-engineering/i-have-a-streaming-aggregation-query-with-highly-variable-micro/m-p/25454#M17695</link>
      <description>&lt;P&gt;By default, the state data (streaming aggregation query) is maintained in the JVM memory of the executors and large number of state objects could put  memory pressure on the JVM causing high GC pauses. If you have stateful operations in your streaming query, it is recommended to use a more optimized state management solution based on&amp;nbsp;&lt;A href="https://rocksdb.org/" alt="https://rocksdb.org/" target="_blank"&gt;RocksDB&lt;/A&gt;.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;More details at &lt;A href="https://docs.databricks.com/spark/latest/structured-streaming/production.html#optimize-performance-of-stateful-streaming-queries" target="test_blank"&gt;https://docs.databricks.com/spark/latest/structured-streaming/production.html#optimize-performance-of-stateful-streaming-queries&lt;/A&gt; &lt;/P&gt;</description>
      <pubDate>Thu, 17 Jun 2021 23:14:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/i-have-a-streaming-aggregation-query-with-highly-variable-micro/m-p/25454#M17695</guid>
      <dc:creator>sajith_appukutt</dc:creator>
      <dc:date>2021-06-17T23:14:58Z</dc:date>
    </item>
  </channel>
</rss>

