<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic How to use checkpoint with change data feed in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-to-use-checkpoint-with-change-data-feed/m-p/5323#M1774</link>
    <description>&lt;P&gt;I have a scheduled job (running in continuous mode) with the following code&lt;/P&gt;&lt;P&gt;```&lt;/P&gt;&lt;P&gt; (&lt;/P&gt;&lt;P&gt;    spark&lt;/P&gt;&lt;P&gt;    .readStream&lt;/P&gt;&lt;P&gt;    .option("checkpointLocation", databricks_checkpoint_location) &lt;/P&gt;&lt;P&gt;    .option("readChangeFeed", "true")&lt;/P&gt;&lt;P&gt;    .option("startingVersion", VERSION + 1)&lt;/P&gt;&lt;P&gt;    .table(databricks_source_table_raw_postgres_nft)&lt;/P&gt;&lt;P&gt;    .writeStream&lt;/P&gt;&lt;P&gt;    .foreachBatch(process_batch)&lt;/P&gt;&lt;P&gt;    .outputMode("append")&lt;/P&gt;&lt;P&gt;    .start()&lt;/P&gt;&lt;P&gt;  )&lt;/P&gt;&lt;P&gt;```&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I set the `VERSION` to a number when I initial the job. However, I found that when I restart the job,  the job starts at the same `VERSION` instead of checkpoint. It looks like the checkpoint is not being used.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Is the checkpoint working with change data feed? If not, how can I ensure the job start at where it stopped, in case the job failed?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I would like to let the `continuous` schedule to restart the workflow immediately after failure, instead of restart with starting version set manually.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
    <pubDate>Tue, 25 Apr 2023 04:14:29 GMT</pubDate>
    <dc:creator>Kit</dc:creator>
    <dc:date>2023-04-25T04:14:29Z</dc:date>
    <item>
      <title>How to use checkpoint with change data feed</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-use-checkpoint-with-change-data-feed/m-p/5323#M1774</link>
      <description>&lt;P&gt;I have a scheduled job (running in continuous mode) with the following code&lt;/P&gt;&lt;P&gt;```&lt;/P&gt;&lt;P&gt; (&lt;/P&gt;&lt;P&gt;    spark&lt;/P&gt;&lt;P&gt;    .readStream&lt;/P&gt;&lt;P&gt;    .option("checkpointLocation", databricks_checkpoint_location) &lt;/P&gt;&lt;P&gt;    .option("readChangeFeed", "true")&lt;/P&gt;&lt;P&gt;    .option("startingVersion", VERSION + 1)&lt;/P&gt;&lt;P&gt;    .table(databricks_source_table_raw_postgres_nft)&lt;/P&gt;&lt;P&gt;    .writeStream&lt;/P&gt;&lt;P&gt;    .foreachBatch(process_batch)&lt;/P&gt;&lt;P&gt;    .outputMode("append")&lt;/P&gt;&lt;P&gt;    .start()&lt;/P&gt;&lt;P&gt;  )&lt;/P&gt;&lt;P&gt;```&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I set the `VERSION` to a number when I initial the job. However, I found that when I restart the job,  the job starts at the same `VERSION` instead of checkpoint. It looks like the checkpoint is not being used.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Is the checkpoint working with change data feed? If not, how can I ensure the job start at where it stopped, in case the job failed?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I would like to let the `continuous` schedule to restart the workflow immediately after failure, instead of restart with starting version set manually.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Tue, 25 Apr 2023 04:14:29 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-use-checkpoint-with-change-data-feed/m-p/5323#M1774</guid>
      <dc:creator>Kit</dc:creator>
      <dc:date>2023-04-25T04:14:29Z</dc:date>
    </item>
    <item>
      <title>Re: How to use checkpoint with change data feed</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-use-checkpoint-with-change-data-feed/m-p/5325#M1776</link>
      <description>&lt;P&gt;Hi @Kit Yam Tse​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thank you for posting your question in our community! We are happy to assist you.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance!&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 07 May 2023 11:49:46 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-use-checkpoint-with-change-data-feed/m-p/5325#M1776</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2023-05-07T11:49:46Z</dc:date>
    </item>
    <item>
      <title>Re: How to use checkpoint with change data feed</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-use-checkpoint-with-change-data-feed/m-p/45533#M27919</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;After doing some tests here, It doesn't seem to work this way.&lt;/P&gt;&lt;P&gt;I'm downstreaming from silver to a gold table and it seems that change data feed is ignoring checkpoint data. It doesn't matter whether I use or not checkpoint location, if starting version is not informed, it's always looking for the latest version.&lt;/P&gt;&lt;P&gt;It means that, if I stop silver to gold downstream, make some changes (generating multiple commit versions) and than resume de downstream, the intermediate changes won't be propagated to the gold table, occurring in data loss.&lt;BR /&gt;That's the behavior I'm having here.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 21 Sep 2023 13:51:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-use-checkpoint-with-change-data-feed/m-p/45533#M27919</guid>
      <dc:creator>gmiguel</dc:creator>
      <dc:date>2023-09-21T13:51:59Z</dc:date>
    </item>
  </channel>
</rss>

