Databricks Community

IM_01 · ‎03-18-2026

Hi,

I was using window function row_number(),min,sum in the code, then the Lakeflow SDP pipeline was failing with the error: NON_TIME_WINDOW_NOT_SUPPORTED_IN_STREAMING - Window function is not supported on streaming dataframes
what is the recommended approach to handle this scenario

Louis_Frolio · ‎03-18-2026

Greetings @IM_01 , I did a little research and I have some helpful hints to share.

What you’re seeing isn’t a bug, and it’s not specific to Lakeflow SDP. It’s just how Spark Structured Streaming works.

At a high level, Structured Streaming only supports time-based windows built with window() on a timestamp column. Once you move into arbitrary SQL window functions — things like row_number() over (...), min() over (...), sum() over (...) — you’re outside what streaming can handle. That’s exactly why you’re hitting NON_TIME_WINDOW_NOT_SUPPORTED_IN_STREAMING.

So the real question becomes: what are you actually trying to compute? From there, the path usually falls into one of three patterns.

First, if you’re really after per-key, per-time-window aggregates, you’re in good shape — you just need to express it the “streaming way.” That means grouping by a time window and using watermarking to manage late data. Something like this:

from pyspark.sql.functions import window, col, sum, min

agg_df = (
  df
    .withWatermark("event_time", "10 minutes")
    .groupBy(
      window(col("event_time"), "5 minutes"),
      col("key_col")
    )
    .agg(
      sum("value").alias("value_sum"),
      min("value").alias("value_min")
    )
)

This keeps everything fully streaming and within the supported model.

Second, if you truly need analytic window functions — ranking, running totals, that kind of thing — streaming isn’t the right place to do it directly.

You’ve got two practical options.

The cleanest pattern is a two-step design. Use Lakeflow SDP (or standard streaming) for what it’s good at — filtering, deduping, time-windowed aggregations — and land the results in a Delta table. Then run a batch job (or non-streaming Lakeflow pipeline) on top of that where you can freely use row_number(), min() over (...), etc. You just schedule that second step based on how fresh the data needs to be.

The other option is foreachBatch. If your logic doesn’t need state across micro-batches, you can treat each batch like a static DataFrame and apply window functions there. Just be careful: if your logic depends on historical context, you’ll need to pull in existing data (e.g., from your target table) and union it with the current batch before applying the window logic.

Third, a lot of the time row_number() is being used for a simpler goal — “give me the latest record per key.” If that’s the case, you don’t need window functions at all. Streaming already gives you better-native patterns:

Stateful aggregation (e.g., max_by-style logic)
Watermarked dedup with .dropDuplicates(key_cols + [time_col])

It naturally follows that the constraint here isn’t really a limitation — it’s a nudge toward using patterns that are actually scalable in a streaming system.

Hope this helps, Louis.

IM_01 · ‎03-21-2026

Hi @Louis_Frolio

Thanks for the response.🙂
Can you please give more context on - "you’ll need to pull in existing data (e.g., from your target table) and union it with the current batch before applying the window logic."

Could you also please share any documentation on the third option

IM_01 · ‎03-30-2026

@Louis_Frolio suppose if I use foreachbatch I might end up with duplicates as the state is not maintained
can you please share more information on max_by

Databricks Community

Structured streaming error- NON_TIME_WINDOW_NOT_SUPPORTED_IN_STREAMING

‌✨‌ DAIS 2026 Community Virtual Contest – Winners Announced! 🏆

DAIS 2026 Day 3 - Last Day, Make It Count

DAIS 2026 Day 2 Recap: Megaphones, Legos, and the volunteers who carried it 🙌

🌟 Community Pulse: Your Weekly Roundup! June 08 – 14, 2026

Solution Accelerator Series | Building a Chatbot With Large Language Models (LLMs)