Louis_Frolio
Databricks Employee
Databricks Employee

Greetings @GeKo 

Good question.

Short answer: treat the Delta transaction log as your source of truth. Every write, delete, or merge on a Unity Catalog table creates a commit with a timestamp and operation type. DESCRIBE HISTORY gives you access to all of it.

The key move is filtering to only data-changing operations so you're not picking up maintenance noise like OPTIMIZE or VACUUM:

SELECT
  max(timestamp) AS last_data_change_ts
FROM (
  DESCRIBE HISTORY catalog_name.schema_name.table_name
)
WHERE operation IN ('WRITE', 'DELETE', 'MERGE', 'UPDATE');

You can broaden or tighten that operation filter depending on what counts as a "data change" in your environment. RESTORE, TRUNCATE, and CREATE TABLE AS SELECT are also valid candidates depending on your use case.

Python equivalent, ready for a scheduled job:

table_name = "catalog_name.schema_name.table_name"

df = spark.sql(f"""
  SELECT max(timestamp) AS last_data_change_ts
  FROM (
    DESCRIBE HISTORY {table_name}
  )
  WHERE operation IN ('WRITE', 'DELETE', 'MERGE', 'UPDATE')
""")

last_ts = df.collect()[0]["last_data_change_ts"]
print(f"Last data change for {table_name}: {last_ts}")

From there you can write last_ts into a monitoring table, compare it to current time, or trigger an alert if the table hasn't been updated in a while. This only reads the transaction log, not the table data, so it's lightweight.

If you don't need to distinguish between operation types and just want a quick timestamp, DESCRIBE DETAIL is a simpler option:

DESCRIBE DETAIL catalog_name.schema_name.table_name

That gives you a lastModified column reflecting the most recent modification. Less granular, but gets the job done for simpler checks.

One gotcha worth flagging: don't rely on the last_altered column in system.information_schema.tables. It only tracks DDL changes like schema modifications and table properties. It does not reflect DML operations (inserts, updates, deletes). If someone writes new rows to your table, last_altered won't move. This bites people regularly.

A couple things to keep in mind:

  • Delta table history is retained for 30 days by default (controlled by delta.logRetentionDuration). For most scheduled monitoring use cases that's plenty, but good to know the boundary.
  • If you ever need row-level change tracking rather than table-level, look into Change Data Feed. Different tool for a different job, but worth knowing it's there.

Hope that helps — let us know how it goes.

Cheers, Lou

View solution in original post