<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Structured Streaming Delta Table - Reading and writing from same table in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/structured-streaming-delta-table-reading-and-writing-from-same/m-p/40263#M27173</link>
    <description>&lt;P&gt;Thanks. Could you please point me to the thread/link which provides a solution for this. I have been blocked for a long time on this and this would really help.&lt;/P&gt;</description>
    <pubDate>Thu, 17 Aug 2023 15:58:40 GMT</pubDate>
    <dc:creator>sparkrookie</dc:creator>
    <dc:date>2023-08-17T15:58:40Z</dc:date>
    <item>
      <title>Structured Streaming Delta Table - Reading and writing from same table</title>
      <link>https://community.databricks.com/t5/data-engineering/structured-streaming-delta-table-reading-and-writing-from-same/m-p/40241#M27168</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a structured streaming job that reads from a delta table "A" and pushes to another delta table "B".&lt;/P&gt;&lt;P&gt;A Schema -&amp;nbsp;group_key, id, timestamp, value&lt;BR /&gt;B Schema - group_key, watermark_timestamp, derived_value&lt;/P&gt;&lt;P&gt;One requirement is that i need to get the max watermark_timestamp from "B" for each group (group_key) and then join that with "A" to filter only all those messages for each group than are &amp;gt; each group's watermark_timestamp. After processing those data and updating state, I need to get the max timestamp from those messages and append in B's&amp;nbsp;watermark_timestamp field for each group. Apart from this, i will push some additional data as well in&amp;nbsp;derived_value column to use downstream.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Basically the above ensures that already processed data does not again come into the stream.&lt;/P&gt;&lt;P&gt;Problem is I am reading from same table as I am writing. When I execute this my job is not succeeding at all when I put B as sink. When I change B to a different table say C then it proceeds.&lt;/P&gt;&lt;P&gt;I tried everything. I tried collect B max group data before the stream even starts. Still not working,&lt;BR /&gt;&lt;BR /&gt;Whats the solution for this? Could someone please help.&lt;/P&gt;&lt;P&gt;Additionally in general if i have a requirement where I need to buffer data for days, I dont want to store everything in memory, apply watermark in arbitary stateful processing and then filter. Whats the best way to solve this problem. I was thinking of using SQL queries which is what my above does.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 17 Aug 2023 14:24:01 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/structured-streaming-delta-table-reading-and-writing-from-same/m-p/40241#M27168</guid>
      <dc:creator>sparkrookie</dc:creator>
      <dc:date>2023-08-17T14:24:01Z</dc:date>
    </item>
    <item>
      <title>Re: Structured Streaming Delta Table - Reading and writing from same table</title>
      <link>https://community.databricks.com/t5/data-engineering/structured-streaming-delta-table-reading-and-writing-from-same/m-p/40263#M27173</link>
      <description>&lt;P&gt;Thanks. Could you please point me to the thread/link which provides a solution for this. I have been blocked for a long time on this and this would really help.&lt;/P&gt;</description>
      <pubDate>Thu, 17 Aug 2023 15:58:40 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/structured-streaming-delta-table-reading-and-writing-from-same/m-p/40263#M27173</guid>
      <dc:creator>sparkrookie</dc:creator>
      <dc:date>2023-08-17T15:58:40Z</dc:date>
    </item>
  </channel>
</rss>

