- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Monday - last edited Monday
I think it may be a trick question. There is not necessarily a single, always-available, best way, since there can be a diversity of object types, configurations, and load patterns for tables in Unity Catalog. The ways that are possible or that are best may depend on the data that (can be) available describing the object.
- If I am wrong, please correct me.
- Some options are below. An order of preference below is numeric, with more-preferred methods at the top.
- Please add on, critique, or suggest better ones. (I probably missed some methods.)
Method 1: Check the _metadata.file_modification_time attribute of the object (if it exists; add it if needed/possible)
Some but not all tables/objects have a _metadata column
- Table are stored as files.
- To know when the table was last updated, get the last file modification time underlying the table.
- See: https://learn.microsoft.com/en-us/azure/databricks/ingestion/file-metadata-column
SELECT
'catalog.schema.your_table' AS object_name,
MAX(_metadata.file_modification_time) AS max_ts
FROM
catalog.schema.your_table
WHERE
_metadata.file_modification_time >= CURRENT_TIMESTAMP() - INTERVAL 7 DAYS
Method 2: Consume DESCRIBE HISTORY results for the object
Some tables history can be found by inspecting HISTORY (Questions to consider: Is there a slight lag for HISTORY being updated? Do all objects necessarily have HISTORY?) Skeleton:
SELECT
'catalog.schema.your_table' AS object_name,
MAX(timestamp) as max_ts
FROM
(DESCRIBE HISTORY catalog.schema.your_table);
Method 3: Check the last time of a successful job, for the job that loads that object (if you know the timing)
from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
# Outline below -- you might have good luck asking an LLM to write it using Databricks ai-dev-tools for reference
# 1. Based on the job name or id that you know, get the job states
# 2. Find the job run with the latest successful state
# 3. That's the runtime of the last job that loaded the tabl
# 4. The data is not fresher than that, but would have data up to t
Method 4: Keep your own watermark table
A solution outline is omitted for brevity, since building your own metadata table is not a great answer for "where to look it up", but hopefully you get the idea -- have the last job write a time when it is done. This can have certain pros/cons, and is probably not what the OP was asking for.