- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Monday
Greetings @GeKo
Good question.
Short answer: treat the Delta transaction log as your source of truth. Every write, delete, or merge on a Unity Catalog table creates a commit with a timestamp and operation type. DESCRIBE HISTORY gives you access to all of it.
The key move is filtering to only data-changing operations so you're not picking up maintenance noise like OPTIMIZE or VACUUM:
SELECT
max(timestamp) AS last_data_change_ts
FROM (
DESCRIBE HISTORY catalog_name.schema_name.table_name
)
WHERE operation IN ('WRITE', 'DELETE', 'MERGE', 'UPDATE');
You can broaden or tighten that operation filter depending on what counts as a "data change" in your environment. RESTORE, TRUNCATE, and CREATE TABLE AS SELECT are also valid candidates depending on your use case.
Python equivalent, ready for a scheduled job:
table_name = "catalog_name.schema_name.table_name"
df = spark.sql(f"""
SELECT max(timestamp) AS last_data_change_ts
FROM (
DESCRIBE HISTORY {table_name}
)
WHERE operation IN ('WRITE', 'DELETE', 'MERGE', 'UPDATE')
""")
last_ts = df.collect()[0]["last_data_change_ts"]
print(f"Last data change for {table_name}: {last_ts}")
From there you can write last_ts into a monitoring table, compare it to current time, or trigger an alert if the table hasn't been updated in a while. This only reads the transaction log, not the table data, so it's lightweight.
If you don't need to distinguish between operation types and just want a quick timestamp, DESCRIBE DETAIL is a simpler option:
DESCRIBE DETAIL catalog_name.schema_name.table_name
That gives you a lastModified column reflecting the most recent modification. Less granular, but gets the job done for simpler checks.
One gotcha worth flagging: don't rely on the last_altered column in system.information_schema.tables. It only tracks DDL changes like schema modifications and table properties. It does not reflect DML operations (inserts, updates, deletes). If someone writes new rows to your table, last_altered won't move. This bites people regularly.
A couple things to keep in mind:
- Delta table history is retained for 30 days by default (controlled by delta.logRetentionDuration). For most scheduled monitoring use cases that's plenty, but good to know the boundary.
- If you ever need row-level change tracking rather than table-level, look into Change Data Feed. Different tool for a different job, but worth knowing it's there.
Hope that helps — let us know how it goes.
Cheers, Lou