Friday
I'm trying to incrementally backup system.information_schema.table_privileges but facing challenges:
No streaming support: Is streaming supported: False
No unique columns for MERGE: All columns contain common values, no natural key combination
No timestamp columns availability: Can't track changes incrementally
Table schema:
Problems:
Spark streaming fails: [UNSUPPORTED_FEATURE.TABLE_OPERATION]
Batch MERGE fails: No reliable unique key columns
No last_altered/created_time for incremental logic
Any solution would be really helpful!
Friday
information_schema is not a Delta Table, which is why you can't stream from it. They are basically views on top of the information coming straight from the control plane database.
Also your query is actually going to be quite slow/expensive (you probably have noted that a select * type query on it is slow since it is a union of all the views from each individual catalog's information_schema).
What's the goal here with trying to maintain a "daily" snapshot?
Friday
Since we need the previous permissions of a table that was dropped, I thought taking a backup of the system.information_schema.table_privileges table would be helpful. any idea if there is an alternative way to retrieve this information?
Friday
Got it. You could try going it another way, with the audit tables. You can tweak this query to add whatever actions you want to monitor, extract the securable name, changes, securable_type etc and run it every day (1 day interval) / compare over time. You could even build alerts off it (let's say you really care about a specific catalog/schema/table) etc.
Query:
select request_params, *
from `system`.`access`.`audit`
WHERE
service_name = 'unityCatalog'
and action_name = 'updatePermissions'
and event_time >= current_timestamp() - interval 30 day
-- and workspace_id = 'your workspace id' -- optional
;
Friday
Nice catch with audit table, and actually, that one can be read as CDF stream, and we can pass action_name to filter
Friday
Thanks for pointing that out Hubert.
It's quite confusing to most (all?) users that "information_schema" is the exception inside the system catalog.
You can stream from all other schemas since they are delta-shares. but information_schema is special ๐