<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: incorrect commit  timestamp after deep clone. in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/incorrect-commit-timestamp-after-deep-clone/m-p/126556#M47715</link>
    <description>&lt;P&gt;do we need to do time synchronization in serverless. as we are planning to move from job cluster to serverless and using commit_timestamp for CDF it can cause issues.&amp;nbsp;&lt;/P&gt;&lt;P&gt;thanks&lt;/P&gt;&lt;P&gt;sugun&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Sat, 26 Jul 2025 21:26:54 GMT</pubDate>
    <dc:creator>sugunk</dc:creator>
    <dc:date>2025-07-26T21:26:54Z</dc:date>
    <item>
      <title>incorrect commit  timestamp after deep clone.</title>
      <link>https://community.databricks.com/t5/data-engineering/incorrect-commit-timestamp-after-deep-clone/m-p/126518#M47708</link>
      <description>&lt;P&gt;i have deep cloned a table, then did update but the update time stamp is less than deep clone timestamp version 0.&lt;BR /&gt;look like there is an issue in the deep clone.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;here is the output, _commit_timestamp order is not in sync with&amp;nbsp;_commit_version&lt;BR /&gt;&lt;BR /&gt;&lt;TABLE width="567"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD width="160"&gt;timecard_transaction_id&lt;/TD&gt;&lt;TD width="123"&gt;_change_type&lt;/TD&gt;&lt;TD width="112"&gt;_commit_version&lt;/TD&gt;&lt;TD width="172"&gt;_commit_timestamp&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;214920856&lt;/TD&gt;&lt;TD&gt;update_preimage&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;2025-07-26T02:27:25+10:00&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;214920856&lt;/TD&gt;&lt;TD&gt;update_postimage&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;2025-07-26T02:27:25+10:00&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;214920856&lt;/TD&gt;&lt;TD&gt;insert&lt;/TD&gt;&lt;TD&gt;0&lt;/TD&gt;&lt;TD&gt;2025-07-26T09:27:33+10:00&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;/DIV&gt;</description>
      <pubDate>Sat, 26 Jul 2025 03:29:41 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/incorrect-commit-timestamp-after-deep-clone/m-p/126518#M47708</guid>
      <dc:creator>sugunk</dc:creator>
      <dc:date>2025-07-26T03:29:41Z</dc:date>
    </item>
    <item>
      <title>Re: incorrect commit  timestamp after deep clone.</title>
      <link>https://community.databricks.com/t5/data-engineering/incorrect-commit-timestamp-after-deep-clone/m-p/126553#M47713</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/176753"&gt;@sugunk&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;You're right to be confused—this behavior doesn't look quite right at first. Let's break it down and see what's really happening.&lt;BR /&gt;This is not a bug in Delta Lake but rather a quirk of how commit_timestamp works:&lt;BR /&gt;commit_timestamp reflects the wall-clock time at the source cluster/node performing the commit, not necessarily in version order.&lt;/P&gt;&lt;P&gt;So:&lt;BR /&gt;- Deep Clone (Version 0): When you cloned the table, it was committed at 2025-07-26 09:27:33+10:00.&lt;BR /&gt;- Update (Version 1): Happened later logically (in terms of Delta version),&lt;BR /&gt;but the commit happened on a node that had an earlier system time, or your cluster had clock skew (e.g., 2025-07-26 02:27:25+10:00).&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Key Points to Know&lt;/STRONG&gt;&lt;BR /&gt;1. commit_version is always consistent and monotonically increasing.&lt;BR /&gt;- Trust versioning for lineage and time travel, not commit_timestamp.&lt;/P&gt;&lt;P&gt;2. commit_timestamp is not guaranteed to be monotonic, especially across clusters, jobs, or time zones.&lt;/P&gt;&lt;P&gt;3. This behavior is documented (though subtly) in Delta Lake specs:&lt;BR /&gt;"commit timestamps are not strictly ordered and should not be used as a proxy for Delta transaction order."&lt;BR /&gt;Delta currently relies on the file modification time to identify the timestamp of a commit...&lt;BR /&gt;this can easily change when files are copied or moved... The possibility of&lt;BR /&gt;non‑monotonic file timestamps also adds lots of code complexity..&lt;BR /&gt;&lt;A href="https://github.com/delta-io/delta/issues/2532?utm_source=chatgpt.com" target="_blank" rel="noopener"&gt;https://github.com/delta-io/delta/issues/2532?utm_source=chatgpt.com&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Recommendations&lt;/STRONG&gt;&lt;BR /&gt;- When tracking history, always use commit_version for order of changes.&lt;BR /&gt;- If consistency of commit_timestamp is required (e.g., for audits), ensure cluster time synchronization via NTP.&lt;BR /&gt;- You can enrich your metadata with a column like event_timestamp inside your data to track true event times.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 26 Jul 2025 18:13:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/incorrect-commit-timestamp-after-deep-clone/m-p/126553#M47713</guid>
      <dc:creator>lingareddy_Alva</dc:creator>
      <dc:date>2025-07-26T18:13:06Z</dc:date>
    </item>
    <item>
      <title>Re: incorrect commit  timestamp after deep clone.</title>
      <link>https://community.databricks.com/t5/data-engineering/incorrect-commit-timestamp-after-deep-clone/m-p/126556#M47715</link>
      <description>&lt;P&gt;do we need to do time synchronization in serverless. as we are planning to move from job cluster to serverless and using commit_timestamp for CDF it can cause issues.&amp;nbsp;&lt;/P&gt;&lt;P&gt;thanks&lt;/P&gt;&lt;P&gt;sugun&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 26 Jul 2025 21:26:54 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/incorrect-commit-timestamp-after-deep-clone/m-p/126556#M47715</guid>
      <dc:creator>sugunk</dc:creator>
      <dc:date>2025-07-26T21:26:54Z</dc:date>
    </item>
    <item>
      <title>Re: incorrect commit  timestamp after deep clone.</title>
      <link>https://community.databricks.com/t5/data-engineering/incorrect-commit-timestamp-after-deep-clone/m-p/126557#M47716</link>
      <description>&lt;P&gt;Yes, this is an important thing to think about if you're switching to Databricks Serverless—especially if your data pipelines use commit_timestamp from Change Data Feed (CDF) to track or filter changes.&lt;/P&gt;&lt;P&gt;In serverless, you can't control or guarantee the exact system time on the machines running your jobs. So if you're using commit_timestamp to decide which data is new or has changed, it might not always be accurate or in the correct order. This could cause your pipeline to miss or duplicate changes.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 26 Jul 2025 21:50:35 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/incorrect-commit-timestamp-after-deep-clone/m-p/126557#M47716</guid>
      <dc:creator>lingareddy_Alva</dc:creator>
      <dc:date>2025-07-26T21:50:35Z</dc:date>
    </item>
  </channel>
</rss>

