lingareddy_Alva
Esteemed Contributor

Hi @sugunk 

You're right to be confused—this behavior doesn't look quite right at first. Let's break it down and see what's really happening.
This is not a bug in Delta Lake but rather a quirk of how commit_timestamp works:
commit_timestamp reflects the wall-clock time at the source cluster/node performing the commit, not necessarily in version order.

So:
- Deep Clone (Version 0): When you cloned the table, it was committed at 2025-07-26 09:27:33+10:00.
- Update (Version 1): Happened later logically (in terms of Delta version),
but the commit happened on a node that had an earlier system time, or your cluster had clock skew (e.g., 2025-07-26 02:27:25+10:00).

Key Points to Know
1. commit_version is always consistent and monotonically increasing.
- Trust versioning for lineage and time travel, not commit_timestamp.

2. commit_timestamp is not guaranteed to be monotonic, especially across clusters, jobs, or time zones.

3. This behavior is documented (though subtly) in Delta Lake specs:
"commit timestamps are not strictly ordered and should not be used as a proxy for Delta transaction order."
Delta currently relies on the file modification time to identify the timestamp of a commit...
this can easily change when files are copied or moved... The possibility of
non‑monotonic file timestamps also adds lots of code complexity..
https://github.com/delta-io/delta/issues/2532?utm_source=chatgpt.com

Recommendations
- When tracking history, always use commit_version for order of changes.
- If consistency of commit_timestamp is required (e.g., for audits), ensure cluster time synchronization via NTP.
- You can enrich your metadata with a column like event_timestamp inside your data to track true event times.

 

LR

View solution in original post