cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Can I have sequence guarantee when replicate with CDF

Brad
Contributor II

Hi team,

I have a delta table src, and somehow I want to replicate it to another table tgt with CDF, sort of

 

(spark
    .readStream
    .format("delta")
    .option("readChangeFeed", "true")
    .table('src')
    .writeStream
    .format("delta")
    .outputMode("append")
    .option("checkpointLocation", 'xxx')
    .toTable('tgt'))

 

Thanks to CDF, in tgt table I can have _commit_version. Can I have guarantee the _commit_version shows in the right sequence as they are in src table?

Thanks

2 REPLIES 2

Mounika_Tarigop
Databricks Employee
Databricks Employee

The _commit_version is a part of the Delta Lake transaction log and is committed at the same time as the new data. This means that the changes are processed in the order they were committed in the source table.Ensure that CDF is enabled on your source Delta table (src). This allows you to capture changes (inserts, updates, deletes) in the source table.

The code snippet you provided, which correctly sets up the streaming read and write operations. By following this, you can ensure that the _commit_version in the target table (tgt) reflects the correct sequence of changes as they occurred in the source table (src). This guarantees that the data in the target table is consistent with the source table in terms of the order of commits

Brad
Contributor II

Thanks. If the replicated table can have the _commit_version in strict sequence, I can take it as a global ever-incremental col and consume the delta of it (e.g. in batch way) with 

select * from replicated_tgt where _commit_version > (
    selecct last_version_offset = max(_commit_version) from downstream
)

Thanks.

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group