<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: AvailableNow Trigger and failure in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/availablenow-trigger-and-failure/m-p/100978#M40496</link>
    <description>&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;SPAN&gt;When using the &lt;CODE&gt;AvailableNow&lt;/CODE&gt; trigger in Spark Structured Streaming, the behavior during a query failure is as follows:&lt;/SPAN&gt;&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;STRONG&gt;End Offset&lt;/STRONG&gt;: The initial end offset set by the &lt;CODE&gt;AvailableNow&lt;/CODE&gt; trigger does not change due to a query failure. The &lt;CODE&gt;AvailableNow&lt;/CODE&gt; trigger processes all available data up to a specific point in time, and this end offset remains fixed even if the query fails.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;STRONG&gt;Query Resumption&lt;/STRONG&gt;: If checkpointing is enabled, the query will resume from where it left off upon recovery. This means that the processing will continue from the last successfully processed offset, not from the beginning. The end offset remains the same as initially set by the &lt;CODE&gt;AvailableNow&lt;/CODE&gt; trigger.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;STRONG&gt;Failure vs. End of Query&lt;/STRONG&gt;: Spark Structured Streaming does differentiate between a query failure and the end of the query. A failure means the query did not complete successfully, and upon recovery, it will continue processing from the last checkpoint. The end of the query, in the context of &lt;CODE&gt;AvailableNow&lt;/CODE&gt;, means that all data up to the specified end offset has been processed.&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;In summary, the end offset set by the &lt;CODE&gt;AvailableNow&lt;/CODE&gt; trigger remains unchanged during a query failure, and the query will resume from the last checkpointed position upon recovery.&lt;/P&gt;</description>
    <pubDate>Wed, 04 Dec 2024 21:38:29 GMT</pubDate>
    <dc:creator>Walter_C</dc:creator>
    <dc:date>2024-12-04T21:38:29Z</dc:date>
    <item>
      <title>AvailableNow Trigger and failure</title>
      <link>https://community.databricks.com/t5/data-engineering/availablenow-trigger-and-failure/m-p/100963#M40491</link>
      <description>&lt;P&gt;Hi,&amp;nbsp;&lt;/P&gt;&lt;P&gt;I wonder what is the supposed to be the behavior of spark structured streaming when using the AvailableNow Trigger and there is a query failure during the query ? More specifically, what happens to the initial end offset set ? Does it change ? While it is clear that using checkpointing the query would resume where it left off, but what happens to the end offset ? To some degree it almost amount to ask if spark structured streaming make difference between a failure and the end of the query ?&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 04 Dec 2024 17:40:12 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/availablenow-trigger-and-failure/m-p/100963#M40491</guid>
      <dc:creator>Maatari</dc:creator>
      <dc:date>2024-12-04T17:40:12Z</dc:date>
    </item>
    <item>
      <title>Re: AvailableNow Trigger and failure</title>
      <link>https://community.databricks.com/t5/data-engineering/availablenow-trigger-and-failure/m-p/100978#M40496</link>
      <description>&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;SPAN&gt;When using the &lt;CODE&gt;AvailableNow&lt;/CODE&gt; trigger in Spark Structured Streaming, the behavior during a query failure is as follows:&lt;/SPAN&gt;&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;STRONG&gt;End Offset&lt;/STRONG&gt;: The initial end offset set by the &lt;CODE&gt;AvailableNow&lt;/CODE&gt; trigger does not change due to a query failure. The &lt;CODE&gt;AvailableNow&lt;/CODE&gt; trigger processes all available data up to a specific point in time, and this end offset remains fixed even if the query fails.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;STRONG&gt;Query Resumption&lt;/STRONG&gt;: If checkpointing is enabled, the query will resume from where it left off upon recovery. This means that the processing will continue from the last successfully processed offset, not from the beginning. The end offset remains the same as initially set by the &lt;CODE&gt;AvailableNow&lt;/CODE&gt; trigger.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;STRONG&gt;Failure vs. End of Query&lt;/STRONG&gt;: Spark Structured Streaming does differentiate between a query failure and the end of the query. A failure means the query did not complete successfully, and upon recovery, it will continue processing from the last checkpoint. The end of the query, in the context of &lt;CODE&gt;AvailableNow&lt;/CODE&gt;, means that all data up to the specified end offset has been processed.&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;In summary, the end offset set by the &lt;CODE&gt;AvailableNow&lt;/CODE&gt; trigger remains unchanged during a query failure, and the query will resume from the last checkpointed position upon recovery.&lt;/P&gt;</description>
      <pubDate>Wed, 04 Dec 2024 21:38:29 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/availablenow-trigger-and-failure/m-p/100978#M40496</guid>
      <dc:creator>Walter_C</dc:creator>
      <dc:date>2024-12-04T21:38:29Z</dc:date>
    </item>
    <item>
      <title>Re: AvailableNow Trigger and failure</title>
      <link>https://community.databricks.com/t5/data-engineering/availablenow-trigger-and-failure/m-p/100984#M40500</link>
      <description>&lt;P&gt;Thank you so much this is really a helpful answer.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;If i may, i would like to understand a bit further the mechanics under the hood. I wonder if it is possible to share the classes involve in this. How the AvailableNow Trigger is able to set a context that makes it that when a query start it is known that the end offset was not processed and therefore we are probably in a failure scenario, vs well the end offset was consumed hence this is a new run so i can refetch a new end offset. The interplay might be coming from somewhere else, i don't know, but i am keep on learning a bit further, getting a sense of where to look for those things.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 04 Dec 2024 22:40:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/availablenow-trigger-and-failure/m-p/100984#M40500</guid>
      <dc:creator>Maatari</dc:creator>
      <dc:date>2024-12-04T22:40:02Z</dc:date>
    </item>
    <item>
      <title>Re: AvailableNow Trigger and failure</title>
      <link>https://community.databricks.com/t5/data-engineering/availablenow-trigger-and-failure/m-p/101078#M40530</link>
      <description>&lt;P&gt;The &lt;CODE&gt;AvailableNow&lt;/CODE&gt; trigger processes all available data as a single batch and then stops. This is different from continuous or micro-batch processing where the system continuously checks for new data. When a query starts with the &lt;CODE&gt;AvailableNow&lt;/CODE&gt; trigger, it determines whether the end offset (the point up to which data has been processed) was previously processed. If the end offset was not processed, it indicates a failure scenario, and the system will attempt to reprocess the data from the last known successful offset. If the end offset was consumed, it signifies a new run, and the system will fetch a new end offset to process the next batch of data.&lt;/P&gt;</description>
      <pubDate>Thu, 05 Dec 2024 14:01:10 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/availablenow-trigger-and-failure/m-p/101078#M40530</guid>
      <dc:creator>Walter_C</dc:creator>
      <dc:date>2024-12-05T14:01:10Z</dc:date>
    </item>
  </channel>
</rss>

