02-13-2023 02:54 PM
Hi,
I am looking for a way to get usage statistics from Databricks (Data Science & Engineering and SQL persona).
For example:
Is there any way to get usage statistics?
02-23-2023 01:45 AM
You can get those type of information by activating verbose audit logs. https://docs.databricks.com/administration-guide/account-settings/audit-logs.html
It contains a lot important metrics that you can leverage to build dashboards.
02-14-2023 05:15 AM
The Overwatch library may help. Unity Catalog also has some auditing to see who has accessed a table. For DLT, there are logs written to storage that you can read as a dataframe and parse for things like start and stop times and count how many starts to determine how often a pipeline is triggered.
02-23-2023 01:45 AM
You can get those type of information by activating verbose audit logs. https://docs.databricks.com/administration-guide/account-settings/audit-logs.html
It contains a lot important metrics that you can leverage to build dashboards.
02-23-2023 01:46 PM
@Mohammad Saber , I hope that you are well. I have the same request from my manager. I was wondering if you found a way around this request already. We could work together, if you have not found the solution just yet
Please let's connect 🙂
02-23-2023 05:14 PM
@Owo Akilo
What I have found so far.
If Workspace is enabled for Unity Catalog:
I set up Azure Log Analytics following the documentation. Now, I get the logs in Azure Log Analytics Workspace.
There is a table named "DatabricksUnityCatalog" where I can find the action name "getTable". Here, I can see table names. But, for some reasons that I don't understand, Databricks doesn't save all table usage information in "DatabricksUnityCatalog".
I queried tables for a few days but I only see "getTable" action for one of the days.
If Workspace is not enabled for Unity Catalog:
What I found is:
1) Using cluster logs for queries run in a notebook (Ref).
After enabling cluster logs, I can see table names in a text file "log4j-active" but this file changes whenever I start the cluster. Seems that this file is archived as gz files but I cannot find table names in the archived files for most queries I had in the notebook.
2) Query history for queries run in the SQL persona (Ref).
We can see the query history in the Databricks user interface but I am looking for a way to send it to the Power BI, and extract table names.
I don't understand how API works and how to use API Specification file (Ref)
02-23-2023 05:45 PM
@Mohammad Saber Many thanks for sharing your learning thus far! I appreciate it
If you wouldn't mind, we could both connect on a Teams call to look into this more intimately. I am pretty new to learning the tool
02-23-2023 06:08 PM
@Owo Akilo
All right, you can see my email in my profile.
02-24-2023 02:38 AM
@Mohammad Saber thank you. I'd be sending you an email shortly
02-24-2023 02:43 AM
@Mohammad Saber , sorry it appears you don't have your email there. Would you help confirm, please?
Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.
Click here to register and join today!
Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.