<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Structured streaming error- NON_TIME_WINDOW_NOT_SUPPORTED_IN_STREAMING in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/structured-streaming-error-non-time-window-not-supported-in/m-p/152533#M53843</link>
    <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/34815"&gt;@Louis_Frolio&lt;/a&gt;&amp;nbsp; suppose if I use foreachbatch I might end up with duplicates as the state is not maintained&lt;BR /&gt;can you please share more information on max_by&lt;/P&gt;</description>
    <pubDate>Mon, 30 Mar 2026 14:17:34 GMT</pubDate>
    <dc:creator>IM_01</dc:creator>
    <dc:date>2026-03-30T14:17:34Z</dc:date>
    <item>
      <title>Structured streaming error- NON_TIME_WINDOW_NOT_SUPPORTED_IN_STREAMING</title>
      <link>https://community.databricks.com/t5/data-engineering/structured-streaming-error-non-time-window-not-supported-in/m-p/151315#M53624</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I was using window function row_number(),min,sum in the code, then the Lakeflow SDP pipeline was failing with the error: NON_TIME_WINDOW_NOT_SUPPORTED_IN_STREAMING - Window function is not supported on streaming dataframes&lt;BR /&gt;&lt;SPAN&gt;what is the recommended approach to handle this scenario&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 18 Mar 2026 17:13:46 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/structured-streaming-error-non-time-window-not-supported-in/m-p/151315#M53624</guid>
      <dc:creator>IM_01</dc:creator>
      <dc:date>2026-03-18T17:13:46Z</dc:date>
    </item>
    <item>
      <title>Re: Structured streaming error- NON_TIME_WINDOW_NOT_SUPPORTED_IN_STREAMING</title>
      <link>https://community.databricks.com/t5/data-engineering/structured-streaming-error-non-time-window-not-supported-in/m-p/151327#M53627</link>
      <description>&lt;P&gt;Greetings&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/193958"&gt;@IM_01&lt;/a&gt;&amp;nbsp;, I did a little research and I have some helpful hints to share.&lt;/P&gt;
&lt;P class="p1"&gt;What you’re seeing isn’t a bug, and it’s not specific to Lakeflow SDP. It’s just how Spark Structured Streaming works.&lt;/P&gt;
&lt;P class="p1"&gt;At a high level, Structured Streaming only supports time-based windows built with &lt;SPAN class="s1"&gt;window()&lt;/SPAN&gt; on a timestamp column. Once you move into arbitrary SQL window functions — things like &lt;SPAN class="s1"&gt;row_number() over (...)&lt;/SPAN&gt;, &lt;SPAN class="s1"&gt;min() over (...)&lt;/SPAN&gt;, &lt;SPAN class="s1"&gt;sum() over (...)&lt;/SPAN&gt; — you’re outside what streaming can handle. That’s exactly why you’re hitting &lt;SPAN class="s1"&gt;NON_TIME_WINDOW_NOT_SUPPORTED_IN_STREAMING&lt;/SPAN&gt;.&lt;/P&gt;
&lt;P class="p1"&gt;So the real question becomes: what are you actually trying to compute? From there, the path usually falls into one of three patterns.&lt;/P&gt;
&lt;P class="p1"&gt;First, if you’re really after per-key, per-time-window aggregates, you’re in good shape — you just need to express it the “streaming way.” That means grouping by a time window and using watermarking to manage late data. Something like this:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;from pyspark.sql.functions import window, col, sum, min

agg_df = (
  df
    .withWatermark("event_time", "10 minutes")
    .groupBy(
      window(col("event_time"), "5 minutes"),
      col("key_col")
    )
    .agg(
      sum("value").alias("value_sum"),
      min("value").alias("value_min")
    )
)&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P class="p1"&gt;This keeps everything fully streaming and within the supported model.&lt;/P&gt;
&lt;P class="p1"&gt;Second, if you truly need analytic window functions — ranking, running totals, that kind of thing — streaming isn’t the right place to do it directly.&lt;/P&gt;
&lt;P class="p1"&gt;You’ve got two practical options.&lt;/P&gt;
&lt;P class="p1"&gt;The cleanest pattern is a two-step design. Use Lakeflow SDP (or standard streaming) for what it’s good at — filtering, deduping, time-windowed aggregations — and land the results in a Delta table. Then run a batch job (or non-streaming Lakeflow pipeline) on top of that where you can freely use &lt;SPAN class="s1"&gt;row_number()&lt;/SPAN&gt;, &lt;SPAN class="s1"&gt;min() over (...)&lt;/SPAN&gt;, etc. You just schedule that second step based on how fresh the data needs to be.&lt;/P&gt;
&lt;P class="p1"&gt;The other option is &lt;SPAN class="s1"&gt;foreachBatch&lt;/SPAN&gt;. If your logic doesn’t need state across micro-batches, you can treat each batch like a static DataFrame and apply window functions there. Just be careful: if your logic depends on historical context, you’ll need to pull in existing data (e.g., from your target table) and union it with the current batch before applying the window logic.&lt;/P&gt;
&lt;P class="p1"&gt;Third, a lot of the time &lt;SPAN class="s1"&gt;row_number()&lt;/SPAN&gt; is being used for a simpler goal — “give me the latest record per key.” If that’s the case, you don’t need window functions at all. Streaming already gives you better-native patterns:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;Stateful aggregation (e.g., &lt;SPAN class="s1"&gt;max_by&lt;/SPAN&gt;-style logic)&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;&lt;SPAN class="s1"&gt;Watermarked dedup with &lt;/SPAN&gt;.dropDuplicates(key_cols + [time_col])&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="p1"&gt;It naturally follows that the constraint here isn’t really a limitation — it’s a nudge toward using patterns that are actually scalable in a streaming system.&lt;/P&gt;
&lt;P class="p1"&gt;Hope this helps, Louis.&lt;/P&gt;</description>
      <pubDate>Wed, 18 Mar 2026 19:21:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/structured-streaming-error-non-time-window-not-supported-in/m-p/151327#M53627</guid>
      <dc:creator>Louis_Frolio</dc:creator>
      <dc:date>2026-03-18T19:21:57Z</dc:date>
    </item>
    <item>
      <title>Re: Structured streaming error- NON_TIME_WINDOW_NOT_SUPPORTED_IN_STREAMING</title>
      <link>https://community.databricks.com/t5/data-engineering/structured-streaming-error-non-time-window-not-supported-in/m-p/151581#M53666</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/34815"&gt;@Louis_Frolio&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;Thanks for the response.&lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;BR /&gt;Can you please give more context on&amp;nbsp;&lt;SPAN&gt;&amp;nbsp;- "you’ll need to pull in existing data (e.g., from your target table) and union it with the current batch before applying the window logic."&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Could you also please share any documentation on the third option&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 21 Mar 2026 08:55:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/structured-streaming-error-non-time-window-not-supported-in/m-p/151581#M53666</guid>
      <dc:creator>IM_01</dc:creator>
      <dc:date>2026-03-21T08:55:03Z</dc:date>
    </item>
    <item>
      <title>Re: Structured streaming error- NON_TIME_WINDOW_NOT_SUPPORTED_IN_STREAMING</title>
      <link>https://community.databricks.com/t5/data-engineering/structured-streaming-error-non-time-window-not-supported-in/m-p/152533#M53843</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/34815"&gt;@Louis_Frolio&lt;/a&gt;&amp;nbsp; suppose if I use foreachbatch I might end up with duplicates as the state is not maintained&lt;BR /&gt;can you please share more information on max_by&lt;/P&gt;</description>
      <pubDate>Mon, 30 Mar 2026 14:17:34 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/structured-streaming-error-non-time-window-not-supported-in/m-p/152533#M53843</guid>
      <dc:creator>IM_01</dc:creator>
      <dc:date>2026-03-30T14:17:34Z</dc:date>
    </item>
  </channel>
</rss>

