cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to use checkpoint with change data feed

Kit
New Contributor III

I have a scheduled job (running in continuous mode) with the following code

```

(

spark

.readStream

.option("checkpointLocation", databricks_checkpoint_location)

.option("readChangeFeed", "true")

.option("startingVersion", VERSION + 1)

.table(databricks_source_table_raw_postgres_nft)

.writeStream

.foreachBatch(process_batch)

.outputMode("append")

.start()

)

```

I set the `VERSION` to a number when I initial the job. However, I found that when I restart the job, the job starts at the same `VERSION` instead of checkpoint. It looks like the checkpoint is not being used.

Is the checkpoint working with change data feed? If not, how can I ensure the job start at where it stopped, in case the job failed?

I would like to let the `continuous` schedule to restart the workflow immediately after failure, instead of restart with starting version set manually.

Thanks

2 REPLIES 2

@Retired_mod ,

After doing some tests here, It doesn't seem to work this way.

I'm downstreaming from silver to a gold table and it seems that change data feed is ignoring checkpoint data. It doesn't matter whether I use or not checkpoint location, if starting version is not informed, it's always looking for the latest version.

It means that, if I stop silver to gold downstream, make some changes (generating multiple commit versions) and than resume de downstream, the intermediate changes won't be propagated to the gold table, occurring in data loss.
That's the behavior I'm having here.

 

Anonymous
Not applicable

Hi @Kit Yam Tse​ 

Thank you for posting your question in our community! We are happy to assist you.

To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?

This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance! 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group