Databricks Community

Kit · ‎04-24-2023

I have a scheduled job (running in continuous mode) with the following code

```

(

spark

.readStream

.option("checkpointLocation", databricks_checkpoint_location)

.option("readChangeFeed", "true")

.option("startingVersion", VERSION + 1)

.table(databricks_source_table_raw_postgres_nft)

.writeStream

.foreachBatch(process_batch)

.outputMode("append")

.start()

)

```

I set the `VERSION` to a number when I initial the job. However, I found that when I restart the job, the job starts at the same `VERSION` instead of checkpoint. It looks like the checkpoint is not being used.

Is the checkpoint working with change data feed? If not, how can I ensure the job start at where it stopped, in case the job failed?

I would like to let the `continuous` schedule to restart the workflow immediately after failure, instead of restart with starting version set manually.

Thanks

gmiguel · ‎09-21-2023

@Retired_mod ,

After doing some tests here, It doesn't seem to work this way.

I'm downstreaming from silver to a gold table and it seems that change data feed is ignoring checkpoint data. It doesn't matter whether I use or not checkpoint location, if starting version is not informed, it's always looking for the latest version.

It means that, if I stop silver to gold downstream, make some changes (generating multiple commit versions) and than resume de downstream, the intermediate changes won't be propagated to the gold table, occurring in data loss.
That's the behavior I'm having here.

Anonymous · ‎05-07-2023

Hi @Kit Yam Tse

Thank you for posting your question in our community! We are happy to assist you.

To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?

This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance!

Databricks Community

How to use checkpoint with change data feed

Connect with Databricks Users in Your Area

Databricks Learning Festival (Virtual): 15 January - 31 January 2025

Milestone: DatabricksTV Reaches 100 Videos!

Announcing the new Meta Llama 3.3 model on Databricks

Databricks Community Champion - December 2024 - Sujesh Menon

Dotmatics and Databricks Partner to Advance Scientific Intelligence in Life Sciences