<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: drop duplicates within watermark in Get Started Discussions</title>
    <link>https://community.databricks.com/t5/get-started-discussions/drop-duplicates-within-watermark/m-p/49448#M1602</link>
    <description>&lt;P&gt;Any maintainer can help me on this question??&lt;/P&gt;</description>
    <pubDate>Wed, 18 Oct 2023 08:16:48 GMT</pubDate>
    <dc:creator>aerofish</dc:creator>
    <dc:date>2023-10-18T08:16:48Z</dc:date>
    <item>
      <title>drop duplicates within watermark</title>
      <link>https://community.databricks.com/t5/get-started-discussions/drop-duplicates-within-watermark/m-p/46354#M1225</link>
      <description>&lt;P&gt;Recently we are using structured streaming to ingest data. We want to use watermark to drop duplicated event. But We encountered some wired behavior and unexpected exception. Anyone can help me to explain what is the expected behavior and how should I use these method in a right ways?&lt;/P&gt;&lt;P&gt;I have four scenarios:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;ingest from json file to delta table: I use withWatermark + dorpDuplicates&lt;BR /&gt;&lt;STRONG&gt;behavior&lt;/STRONG&gt;: it will drop all duplicates within the watermark and also &lt;FONT color="#FF0000"&gt;drop all events (not only duplicated events)&lt;/FONT&gt; older than watermark. Is this expected behavior?&lt;/LI&gt;&lt;LI&gt;ingest from delta table to delta table: I use withWatermark + dropduplicates&lt;BR /&gt;&lt;STRONG&gt;behavior&lt;/STRONG&gt;: it will drop all duplicates within the watermark and also drop duplicated event older than watermark&lt;/LI&gt;&lt;LI&gt;ingest from delta table to delta table; withWatermark + dropDuplicatesWithinWatermark&lt;BR /&gt;&lt;STRONG&gt;behavior&lt;/STRONG&gt;: I tested with the new introduce method -&amp;nbsp;dropDuplicatesWithinWatermark. Every time It will throw &lt;FONT color="#FF0000"&gt;error: java.util.NoSuchElementException: None.get&lt;/FONT&gt;. It's a generic exception. Can anyone explain why I got this error by doing just basic invocation of dropDuplicatesWithinWatermark?&lt;/LI&gt;&lt;LI&gt;ingest from json file to delta table; withWatermark + dropduplicatewithwatermark&lt;BR /&gt;&lt;STRONG&gt;behavior&lt;/STRONG&gt;: it will drop duplicates within wartermark, and also &lt;FONT color="#FF0000"&gt;drop every event&lt;/FONT&gt; older than watermark. So the behavior is different compare to 3rd scenario(Same method, but from delta table to delta table)&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;Should I use dropDuplicatesWithinWatermark? it throws exception when doing delta table to delta table ingestion. Is it a bug?&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;</description>
      <pubDate>Wed, 27 Sep 2023 07:16:22 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/drop-duplicates-within-watermark/m-p/46354#M1225</guid>
      <dc:creator>aerofish</dc:creator>
      <dc:date>2023-09-27T07:16:22Z</dc:date>
    </item>
    <item>
      <title>Re: drop duplicates within watermark</title>
      <link>https://community.databricks.com/t5/get-started-discussions/drop-duplicates-within-watermark/m-p/46691#M1347</link>
      <description>&lt;P&gt;I can confirm we are also getting the same error with the case NO. 3:&lt;BR /&gt;&amp;gt;&amp;nbsp;&lt;SPAN&gt;ingest from delta table to delta table; withWatermark + dropDuplicatesWithinWatermark&lt;/SPAN&gt;&lt;BR /&gt;&lt;STRONG&gt;behavior&lt;/STRONG&gt;&lt;SPAN&gt;: I tested with the new introduce method -&amp;nbsp;dropDuplicatesWithinWatermark. Every time It will throw&amp;nbsp;&lt;/SPAN&gt;&lt;FONT color="#FF0000"&gt;error: java.util.NoSuchElementException: None.get&lt;/FONT&gt;&lt;SPAN&gt;. It's a generic exception.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Max_Liu_0-1695956349652.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/4107iDF5B45CFFECE0D8E/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400" role="button" title="Max_Liu_0-1695956349652.png" alt="Max_Liu_0-1695956349652.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 29 Sep 2023 02:59:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/drop-duplicates-within-watermark/m-p/46691#M1347</guid>
      <dc:creator>Max_Liu</dc:creator>
      <dc:date>2023-09-29T02:59:16Z</dc:date>
    </item>
    <item>
      <title>Re: drop duplicates within watermark</title>
      <link>https://community.databricks.com/t5/get-started-discussions/drop-duplicates-within-watermark/m-p/48723#M1511</link>
      <description>&lt;P&gt;Thanks for sharing your experience!&lt;/P&gt;&lt;P&gt;Waiting for more explanation and solutions...&lt;/P&gt;</description>
      <pubDate>Mon, 09 Oct 2023 05:49:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/drop-duplicates-within-watermark/m-p/48723#M1511</guid>
      <dc:creator>aerofish</dc:creator>
      <dc:date>2023-10-09T05:49:31Z</dc:date>
    </item>
    <item>
      <title>Re: drop duplicates within watermark</title>
      <link>https://community.databricks.com/t5/get-started-discussions/drop-duplicates-within-watermark/m-p/49448#M1602</link>
      <description>&lt;P&gt;Any maintainer can help me on this question??&lt;/P&gt;</description>
      <pubDate>Wed, 18 Oct 2023 08:16:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/drop-duplicates-within-watermark/m-p/49448#M1602</guid>
      <dc:creator>aerofish</dc:creator>
      <dc:date>2023-10-18T08:16:48Z</dc:date>
    </item>
  </channel>
</rss>

