<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Can I have sequence guarantee when replicate with CDF in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/can-i-have-sequence-guarantee-when-replicate-with-cdf/m-p/98722#M39817</link>
    <description>&lt;P&gt;Thanks. If the replicated table can have the _commit_version in strict sequence, I can take it as a global ever-incremental col and consume the delta of it (e.g. in batch way) with&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;select * from replicated_tgt where _commit_version &amp;gt; (
    selecct last_version_offset = max(_commit_version) from downstream
)&lt;/LI-CODE&gt;&lt;P&gt;Thanks.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 13 Nov 2024 22:54:56 GMT</pubDate>
    <dc:creator>MikeGo</dc:creator>
    <dc:date>2024-11-13T22:54:56Z</dc:date>
    <item>
      <title>Can I have sequence guarantee when replicate with CDF</title>
      <link>https://community.databricks.com/t5/data-engineering/can-i-have-sequence-guarantee-when-replicate-with-cdf/m-p/98175#M39631</link>
      <description>&lt;P&gt;Hi team,&lt;/P&gt;&lt;P&gt;I have a delta table src, and somehow I want to replicate it to another table tgt with CDF, sort of&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;(spark
    .readStream
    .format("delta")
    .option("readChangeFeed", "true")
    .table('src')
    .writeStream
    .format("delta")
    .outputMode("append")
    .option("checkpointLocation", 'xxx')
    .toTable('tgt'))&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks to CDF, in tgt table I can have&amp;nbsp;&lt;SPAN&gt;_commit_version. Can I have&amp;nbsp;guarantee the&amp;nbsp;_commit_version shows in the right sequence as they are in src table?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Fri, 08 Nov 2024 08:29:33 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/can-i-have-sequence-guarantee-when-replicate-with-cdf/m-p/98175#M39631</guid>
      <dc:creator>MikeGo</dc:creator>
      <dc:date>2024-11-08T08:29:33Z</dc:date>
    </item>
    <item>
      <title>Re: Can I have sequence guarantee when replicate with CDF</title>
      <link>https://community.databricks.com/t5/data-engineering/can-i-have-sequence-guarantee-when-replicate-with-cdf/m-p/98717#M39815</link>
      <description>&lt;P class="p1"&gt;The _commit_version is a part of the Delta Lake transaction log and is committed at the same time as the new data. This means that the changes are processed in the order they were committed in the source table.Ensure that CDF is enabled on your source Delta table (src). This allows you to capture changes (inserts, updates, deletes) in the source table.&lt;/P&gt;
&lt;P class="p1"&gt;The code snippet you provided, which correctly sets up the streaming read and write operations. By following this, you can ensure that the _commit_version in the target table (tgt) reflects the correct sequence of changes as they occurred in the source table (src). This guarantees that the data in the target table is consistent with the source table in terms of the order of commits&lt;/P&gt;</description>
      <pubDate>Wed, 13 Nov 2024 20:47:04 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/can-i-have-sequence-guarantee-when-replicate-with-cdf/m-p/98717#M39815</guid>
      <dc:creator>Mounika_Tarigop</dc:creator>
      <dc:date>2024-11-13T20:47:04Z</dc:date>
    </item>
    <item>
      <title>Re: Can I have sequence guarantee when replicate with CDF</title>
      <link>https://community.databricks.com/t5/data-engineering/can-i-have-sequence-guarantee-when-replicate-with-cdf/m-p/98722#M39817</link>
      <description>&lt;P&gt;Thanks. If the replicated table can have the _commit_version in strict sequence, I can take it as a global ever-incremental col and consume the delta of it (e.g. in batch way) with&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;select * from replicated_tgt where _commit_version &amp;gt; (
    selecct last_version_offset = max(_commit_version) from downstream
)&lt;/LI-CODE&gt;&lt;P&gt;Thanks.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 13 Nov 2024 22:54:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/can-i-have-sequence-guarantee-when-replicate-with-cdf/m-p/98722#M39817</guid>
      <dc:creator>MikeGo</dc:creator>
      <dc:date>2024-11-13T22:54:56Z</dc:date>
    </item>
  </channel>
</rss>

