cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Missing workspaces in workspaces_latest but present in audit

Danish11052000
Contributor

While validating workspace coverage, we observed that some workspace_id exist in system.access.audit but return NULL in system.access.workspaces_latest.

 
 
select distinct a.workspace_id, w.workspace_name
from system.access.audit a
left join system.access.workspaces_latest w
on w.workspace_id = a.workspace_id
where w.workspace_name is null;โ€‹



Is there a known reason behind this behavior?
Would really appreciate it if anyone is aware and can share insights.
1 ACCEPTED SOLUTION

Accepted Solutions

Ashwin_DSA
Databricks Employee
Databricks Employee

Hey @Danish11052000,

Yes... This is expected and documented behaviour, not a bug.

system.access.workspaces_latest contains only active workspaces in the account. When a workspace is cancelled/removed from the account, its row is removed from this table. 

Ashwin_DSA_0-1776239927759.png

system.access.audit is a 365-day history of audit events for workspaces in the region, and it continues to store events even after a workspace has been deleted.

So when you run that particular query in your post, the rows you see are typically events for workspaces that used to exist (or were recently deprovisioned) but are no longer present in workspaces_latest. Because workspaces_latest is a slow-changing dimension of the latest state of active workspaces. It wonโ€™t retain those historical entries, while audit will.

If youโ€™re validating coverage, you can interpret workspace_ids that donโ€™t resolve in workspaces_latest as deleted/no-longer-in-account workspaces (or in rare cases, very recently created ones before workspaces_latest has caught up).

If this answer resolves your question, could you mark it as โ€œAccept as Solutionโ€? That helps other users quickly find the correct fix.

 

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***

View solution in original post

1 REPLY 1

Ashwin_DSA
Databricks Employee
Databricks Employee

Hey @Danish11052000,

Yes... This is expected and documented behaviour, not a bug.

system.access.workspaces_latest contains only active workspaces in the account. When a workspace is cancelled/removed from the account, its row is removed from this table. 

Ashwin_DSA_0-1776239927759.png

system.access.audit is a 365-day history of audit events for workspaces in the region, and it continues to store events even after a workspace has been deleted.

So when you run that particular query in your post, the rows you see are typically events for workspaces that used to exist (or were recently deprovisioned) but are no longer present in workspaces_latest. Because workspaces_latest is a slow-changing dimension of the latest state of active workspaces. It wonโ€™t retain those historical entries, while audit will.

If youโ€™re validating coverage, you can interpret workspace_ids that donโ€™t resolve in workspaces_latest as deleted/no-longer-in-account workspaces (or in rare cases, very recently created ones before workspaces_latest has caught up).

If this answer resolves your question, could you mark it as โ€œAccept as Solutionโ€? That helps other users quickly find the correct fix.

 

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***