ā02-13-2023 02:54 PM
Hi,
I am looking for a way to get usage statistics from Databricks (Data Science & Engineering and SQL persona).
For example:
Is there any way to get usage statistics?
ā02-23-2023 01:45 AM
You can get those type of information by activating verbose audit logs. https://docs.databricks.com/administration-guide/account-settings/audit-logs.html
It contains a lot important metrics that you can leverage to build dashboards.
ā02-14-2023 05:15 AM
The Overwatch library may help. Unity Catalog also has some auditing to see who has accessed a table. For DLT, there are logs written to storage that you can read as a dataframe and parse for things like start and stop times and count how many starts to determine how often a pipeline is triggered.
ā02-23-2023 01:45 AM
You can get those type of information by activating verbose audit logs. https://docs.databricks.com/administration-guide/account-settings/audit-logs.html
It contains a lot important metrics that you can leverage to build dashboards.
ā02-23-2023 01:46 PM
@Mohammad Saberā , I hope that you are well. I have the same request from my manager. I was wondering if you found a way around this request already. We could work together, if you have not found the solution just yet
Please let's connect š
ā02-23-2023 05:14 PM
@Owo Akiloā
What I have found so far.
If Workspace is enabled for Unity Catalog:
I set up Azure Log Analytics following the documentation. Now, I get the logs in Azure Log Analytics Workspace.
There is a table named "DatabricksUnityCatalog" where I can find the action name "getTable". Here, I can see table names. But, for some reasons that I don't understand, Databricks doesn't save all table usage information in "DatabricksUnityCatalog".
I queried tables for a few days but I only see "getTable" action for one of the days.
If Workspace is not enabled for Unity Catalog:
What I found is:
1) Using cluster logs for queries run in a notebook (Ref).
After enabling cluster logs, I can see table names in a text file "log4j-active" but this file changes whenever I start the cluster. Seems that this file is archived as gz files but I cannot find table names in the archived files for most queries I had in the notebook.
2) Query history for queries run in the SQL persona (Ref).
We can see the query history in the Databricks user interface but I am looking for a way to send it to the Power BI, and extract table names.
I don't understand how API works and how to use API Specification file (Ref)
ā02-23-2023 05:45 PM
@Mohammad Saberā Many thanks for sharing your learning thus far! I appreciate it
If you wouldn't mind, we could both connect on a Teams call to look into this more intimately. I am pretty new to learning the tool
ā02-23-2023 06:08 PM
@Owo Akiloā
All right, you can see my email in my profile.
ā02-24-2023 02:38 AM
@Mohammad Saberā thank you. I'd be sending you an email shortly
ā02-24-2023 02:43 AM
@Mohammad Saberā , sorry it appears you don't have your email there. Would you help confirm, please?
ā05-15-2024 11:51 PM - edited ā05-15-2024 11:52 PM
You can use System Tables, now available in Unity Catalog metastore, to create the views you described. https://docs.databricks.com/en/admin/system-tables/index.html
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonāt want to miss the chance to attend and share knowledge.
If there isnāt a group near you, start one and help create a community that brings people together.
Request a New Group