<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Stream-stream window join after time window aggregation not working in 13.1 in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/stream-stream-window-join-after-time-window-aggregation-not/m-p/3048#M230</link>
    <description>&lt;P&gt;Hey,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I'm trying to perform &lt;I&gt;Time window aggregation in two different streams followed by stream-stream window join&lt;/I&gt; described &lt;A href="https://docs.databricks.com/structured-streaming/stateful-streaming.html#time-window-aggregation-in-two-different-streams-followed-by-stream-stream-window-join" alt="https://docs.databricks.com/structured-streaming/stateful-streaming.html#time-window-aggregation-in-two-different-streams-followed-by-stream-stream-window-join" target="_blank"&gt;here&lt;/A&gt;. I'm running Databricks Runtime 13.1, exactly as advised.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;However, when I'm reproducing the following code:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;clicksWindow = clicksWithWatermark.groupBy(
  clicksWithWatermark.clickAdId,
  window(clicksWithWatermark.clickTime, "1 hour")
).count()
&amp;nbsp;
impressionsWindow = impressionsWithWatermark.groupBy(
  impressionsWithWatermark.impressionAdId,
  window(impressionsWithWatermark.impressionTime, "1 hour")
).count()
&amp;nbsp;
clicksWindow.join(impressionsWindow, "window", "inner")&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;I'm not getting any result from the joined table in append mode. It is just empty no matter whether I'm using AdId in groupBy or not. The same behaviour is in Python and Scala.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If I join on window.end, not window, then I start receiving results but then I can use only inner join (as the joined condition, window.end, is not a watermarked column) but I need do to outer join for my use case (even with inner join, state seems to increase indefinitely).&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Any help with reproducing this example is appreciated&lt;/P&gt;</description>
    <pubDate>Thu, 15 Jun 2023 07:59:44 GMT</pubDate>
    <dc:creator>azera</dc:creator>
    <dc:date>2023-06-15T07:59:44Z</dc:date>
    <item>
      <title>Stream-stream window join after time window aggregation not working in 13.1</title>
      <link>https://community.databricks.com/t5/data-engineering/stream-stream-window-join-after-time-window-aggregation-not/m-p/3048#M230</link>
      <description>&lt;P&gt;Hey,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I'm trying to perform &lt;I&gt;Time window aggregation in two different streams followed by stream-stream window join&lt;/I&gt; described &lt;A href="https://docs.databricks.com/structured-streaming/stateful-streaming.html#time-window-aggregation-in-two-different-streams-followed-by-stream-stream-window-join" alt="https://docs.databricks.com/structured-streaming/stateful-streaming.html#time-window-aggregation-in-two-different-streams-followed-by-stream-stream-window-join" target="_blank"&gt;here&lt;/A&gt;. I'm running Databricks Runtime 13.1, exactly as advised.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;However, when I'm reproducing the following code:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;clicksWindow = clicksWithWatermark.groupBy(
  clicksWithWatermark.clickAdId,
  window(clicksWithWatermark.clickTime, "1 hour")
).count()
&amp;nbsp;
impressionsWindow = impressionsWithWatermark.groupBy(
  impressionsWithWatermark.impressionAdId,
  window(impressionsWithWatermark.impressionTime, "1 hour")
).count()
&amp;nbsp;
clicksWindow.join(impressionsWindow, "window", "inner")&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;I'm not getting any result from the joined table in append mode. It is just empty no matter whether I'm using AdId in groupBy or not. The same behaviour is in Python and Scala.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If I join on window.end, not window, then I start receiving results but then I can use only inner join (as the joined condition, window.end, is not a watermarked column) but I need do to outer join for my use case (even with inner join, state seems to increase indefinitely).&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Any help with reproducing this example is appreciated&lt;/P&gt;</description>
      <pubDate>Thu, 15 Jun 2023 07:59:44 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/stream-stream-window-join-after-time-window-aggregation-not/m-p/3048#M230</guid>
      <dc:creator>azera</dc:creator>
      <dc:date>2023-06-15T07:59:44Z</dc:date>
    </item>
    <item>
      <title>Re: Stream-stream window join after time window aggregation not working in 13.1</title>
      <link>https://community.databricks.com/t5/data-engineering/stream-stream-window-join-after-time-window-aggregation-not/m-p/3049#M231</link>
      <description>&lt;P&gt;Hi @Andrzej Zera​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Great to meet you, and thanks for your question!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt; Let's see if your peers in the community have an answer to your question. Thanks.&lt;/P&gt;</description>
      <pubDate>Sun, 18 Jun 2023 12:57:32 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/stream-stream-window-join-after-time-window-aggregation-not/m-p/3049#M231</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2023-06-18T12:57:32Z</dc:date>
    </item>
    <item>
      <title>Re: Stream-stream window join after time window aggregation not working in 13.1</title>
      <link>https://community.databricks.com/t5/data-engineering/stream-stream-window-join-after-time-window-aggregation-not/m-p/50018#M28687</link>
      <description>&lt;P&gt;Hey,&lt;BR /&gt;&lt;BR /&gt;I'm currently facing the same problem, so I would&amp;nbsp;&lt;SPAN&gt;to know if you've made any progress in resolving this issue.&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 27 Oct 2023 13:05:34 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/stream-stream-window-join-after-time-window-aggregation-not/m-p/50018#M28687</guid>
      <dc:creator>Happyfield7</dc:creator>
      <dc:date>2023-10-27T13:05:34Z</dc:date>
    </item>
  </channel>
</rss>

