<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Issue with Multiple Stateful Operations in Databricks Structured Streaming in Get Started Discussions</title>
    <link>https://community.databricks.com/t5/get-started-discussions/issue-with-multiple-stateful-operations-in-databricks-structured/m-p/108595#M9740</link>
    <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/108864"&gt;@fperry&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;This error occurs because the query contains stateful operations that can emit rows older than the current watermark plus the allowed late record delay. These rows are considered "late rows" in downstream stateful operations and can be discarded.&amp;nbsp;You might need to adjust the watermark duration or the allowed late record delay to accommodate the lateness of your data. This can help prevent the discarding of late rows.&lt;/P&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;SPAN&gt;If you understand the risks and still need to run the query, you can disable the correctness check by setting the following configuration:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;DIV class="gb5fhw2"&gt;
&lt;PRE&gt;&lt;CODE class="markdown-code-python _1t7bu9hb hljs language-python gb5fhw3"&gt;spark.conf.&lt;SPAN class="hljs-built_in"&gt;set&lt;/SPAN&gt;(&lt;SPAN class="hljs-string"&gt;"spark.sql.streaming.statefulOperator.checkCorrectness.enabled"&lt;/SPAN&gt;, &lt;SPAN class="hljs-string"&gt;"false"&lt;/SPAN&gt;)&lt;/CODE&gt;&lt;/PRE&gt;
&lt;/DIV&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;SPAN&gt;However, this should be done with caution as it can lead to potential correctness issues in your streaming application&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Mon, 03 Feb 2025 14:53:08 GMT</pubDate>
    <dc:creator>Alberto_Umana</dc:creator>
    <dc:date>2025-02-03T14:53:08Z</dc:date>
    <item>
      <title>Issue with Multiple Stateful Operations in Databricks Structured Streaming</title>
      <link>https://community.databricks.com/t5/get-started-discussions/issue-with-multiple-stateful-operations-in-databricks-structured/m-p/108579#M9739</link>
      <description>&lt;P&gt;Hi everyone,&lt;/P&gt;&lt;P&gt;I'm working with Databricks structured streaming and have encountered an issue with stateful operations. Below is my pseudo-code:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;df = df.withWatermark("timestamp", "1 second")

df_header = df.withColumn("message_id", F.col("payload.id"))

df_values = df.withColumn("message_id", F.col("payload.id")) \
              .withColumn("values_exploded", F.explode("payload.values")) \
              .withColumn("name", F.col("values_exploded.name")) \
              .groupBy(F.window(F.col("timestamp"), "10 seconds"), F.col("message_id"), F.col("name")) \
              .agg(F.collect_list("values_exploded").alias("values_grouped"))

...

df_values_grouped = df_values.groupBy(F.window(df_values.window, "10 seconds"), F.col("message_id")) \
                             .agg(F.collect_list(F.struct("*")).alias("values"))

final_df = df_header.join(df_values_grouped, "message_id", "inner")&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;From my understanding, it should be possible to do multiple stateful operations in Spark/Databricks since the 13.1/3.5.0 release. However, I am getting the following error:&lt;/SPAN&gt;&lt;/P&gt;&lt;PRE&gt;Detected pattern of possible 'correctness' issue due to global watermark. The query contains stateful operations which can emit rows older than the current watermark plus allowed late record delay, which are "late rows" in downstream stateful operations and these rows can be discarded.&lt;/PRE&gt;&lt;P&gt;&lt;SPAN&gt;Why am I getting this error, and how can I fix it?&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 03 Feb 2025 13:05:42 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/issue-with-multiple-stateful-operations-in-databricks-structured/m-p/108579#M9739</guid>
      <dc:creator>fperry</dc:creator>
      <dc:date>2025-02-03T13:05:42Z</dc:date>
    </item>
    <item>
      <title>Re: Issue with Multiple Stateful Operations in Databricks Structured Streaming</title>
      <link>https://community.databricks.com/t5/get-started-discussions/issue-with-multiple-stateful-operations-in-databricks-structured/m-p/108595#M9740</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/108864"&gt;@fperry&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;This error occurs because the query contains stateful operations that can emit rows older than the current watermark plus the allowed late record delay. These rows are considered "late rows" in downstream stateful operations and can be discarded.&amp;nbsp;You might need to adjust the watermark duration or the allowed late record delay to accommodate the lateness of your data. This can help prevent the discarding of late rows.&lt;/P&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;SPAN&gt;If you understand the risks and still need to run the query, you can disable the correctness check by setting the following configuration:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;DIV class="gb5fhw2"&gt;
&lt;PRE&gt;&lt;CODE class="markdown-code-python _1t7bu9hb hljs language-python gb5fhw3"&gt;spark.conf.&lt;SPAN class="hljs-built_in"&gt;set&lt;/SPAN&gt;(&lt;SPAN class="hljs-string"&gt;"spark.sql.streaming.statefulOperator.checkCorrectness.enabled"&lt;/SPAN&gt;, &lt;SPAN class="hljs-string"&gt;"false"&lt;/SPAN&gt;)&lt;/CODE&gt;&lt;/PRE&gt;
&lt;/DIV&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;SPAN&gt;However, this should be done with caution as it can lead to potential correctness issues in your streaming application&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 03 Feb 2025 14:53:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/issue-with-multiple-stateful-operations-in-databricks-structured/m-p/108595#M9740</guid>
      <dc:creator>Alberto_Umana</dc:creator>
      <dc:date>2025-02-03T14:53:08Z</dc:date>
    </item>
    <item>
      <title>Re: Issue with Multiple Stateful Operations in Databricks Structured Streaming</title>
      <link>https://community.databricks.com/t5/get-started-discussions/issue-with-multiple-stateful-operations-in-databricks-structured/m-p/108599#M9741</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/106294"&gt;@Alberto_Umana&lt;/a&gt;,&lt;/P&gt;&lt;P&gt;Thank you for your reply. Can you maybe explain how adjusting the watermark duration would fix this issue? I just tested it with a duration of 10 minutes and left everything else the same. However, I'm still facing the same error.&lt;/P&gt;</description>
      <pubDate>Mon, 03 Feb 2025 15:02:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/issue-with-multiple-stateful-operations-in-databricks-structured/m-p/108599#M9741</guid>
      <dc:creator>fperry</dc:creator>
      <dc:date>2025-02-03T15:02:43Z</dc:date>
    </item>
    <item>
      <title>Re: Issue with Multiple Stateful Operations in Databricks Structured Streaming</title>
      <link>https://community.databricks.com/t5/get-started-discussions/issue-with-multiple-stateful-operations-in-databricks-structured/m-p/108618#M9742</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/108864"&gt;@fperry&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;Which DBR version are you using? Could you please try with Databricks Runtime 13.3 LTS?&lt;/P&gt;</description>
      <pubDate>Mon, 03 Feb 2025 16:32:19 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/issue-with-multiple-stateful-operations-in-databricks-structured/m-p/108618#M9742</guid>
      <dc:creator>Alberto_Umana</dc:creator>
      <dc:date>2025-02-03T16:32:19Z</dc:date>
    </item>
    <item>
      <title>Re: Issue with Multiple Stateful Operations in Databricks Structured Streaming</title>
      <link>https://community.databricks.com/t5/get-started-discussions/issue-with-multiple-stateful-operations-in-databricks-structured/m-p/108620#M9743</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/106294"&gt;@Alberto_Umana&lt;/a&gt;,&lt;/P&gt;&lt;P&gt;I'm using&amp;nbsp;&lt;SPAN&gt;DATABRICKS_RUNTIME_VERSION: 16.1&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 03 Feb 2025 16:55:27 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/issue-with-multiple-stateful-operations-in-databricks-structured/m-p/108620#M9743</guid>
      <dc:creator>fperry</dc:creator>
      <dc:date>2025-02-03T16:55:27Z</dc:date>
    </item>
    <item>
      <title>Re: Issue with Multiple Stateful Operations in Databricks Structured Streaming</title>
      <link>https://community.databricks.com/t5/get-started-discussions/issue-with-multiple-stateful-operations-in-databricks-structured/m-p/108655#M9744</link>
      <description>&lt;P&gt;This should according to this blog post basically work, right? However, I'm getting the same error&lt;BR /&gt;&lt;A href="https://www.databricks.com/blog/multiple-stateful-operators-structured-streaming" target="_blank"&gt;Multiple Stateful Streaming Operators | Databricks Blog&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Or am I missing something?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;rate_df = spark.readStream.format("rate").option("rowsPerSecond", "1").load()

rate_df = rate_df.withWatermark("timestamp", "2 seconds")

# display(rate_df)

counts1 = rate_df.groupBy(F.window("timestamp", "10 seconds")).count()

counts2 = counts1.groupBy(F.window(F.window_time(counts1.window), "20 seconds")).count()

display(counts2)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 03 Feb 2025 20:20:00 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/issue-with-multiple-stateful-operations-in-databricks-structured/m-p/108655#M9744</guid>
      <dc:creator>fperry</dc:creator>
      <dc:date>2025-02-03T20:20:00Z</dc:date>
    </item>
    <item>
      <title>Re: Issue with Multiple Stateful Operations in Databricks Structured Streaming</title>
      <link>https://community.databricks.com/t5/get-started-discussions/issue-with-multiple-stateful-operations-in-databricks-structured/m-p/118909#M10014</link>
      <description>&lt;P&gt;any solution on this error ? not the copy paste error output please&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 12 May 2025 13:58:33 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/issue-with-multiple-stateful-operations-in-databricks-structured/m-p/118909#M10014</guid>
      <dc:creator>Adam_g</dc:creator>
      <dc:date>2025-05-12T13:58:33Z</dc:date>
    </item>
  </channel>
</rss>

