cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Warehousing & Analytics
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How to get usage statistics from Databricks or SQL Databricks?

Mado
Valued Contributor II

Hi,

I am looking for a way to get usage statistics from Databricks (Data Science & Engineering and SQL persona).

For example:

  • I created a table. I want to know how many times a specific user queried that table.
  • How many times a pipeline was triggered?
  • How long it took to run a DLT pipeline?

Is there any way to get usage statistics?

1 ACCEPTED SOLUTION

Accepted Solutions

youssefmrini
Honored Contributor III
Honored Contributor III

You can get those type of information by activating verbose audit logs. https://docs.databricks.com/administration-guide/account-settings/audit-logs.html

It contains a lot important metrics that you can leverage to build dashboards.

View solution in original post

9 REPLIES 9

Anonymous
Not applicable

The Overwatch library may help. Unity Catalog also has some auditing to see who has accessed a table. For DLT, there are logs written to storage that you can read as a dataframe and parse for things like start and stop times and count how many starts to determine how often a pipeline is triggered.

youssefmrini
Honored Contributor III
Honored Contributor III

You can get those type of information by activating verbose audit logs. https://docs.databricks.com/administration-guide/account-settings/audit-logs.html

It contains a lot important metrics that you can leverage to build dashboards.

Afeez_Raheem
New Contributor III

@Mohammad Saberโ€‹ , I hope that you are well. I have the same request from my manager. I was wondering if you found a way around this request already. We could work together, if you have not found the solution just yet

Please let's connect ๐Ÿ™‚

Mado
Valued Contributor II

@Owo Akiloโ€‹ 

What I have found so far.

If Workspace is enabled for Unity Catalog:

I set up Azure Log Analytics following the documentation. Now, I get the logs in Azure Log Analytics Workspace.

image 

There is a table named "DatabricksUnityCatalog" where I can find the action name "getTable". Here, I can see table names. But, for some reasons that I don't understand, Databricks doesn't save all table usage information in "DatabricksUnityCatalog".

I queried tables for a few days but I only see "getTable" action for one of the days.

If Workspace is not enabled for Unity Catalog:

What I found is:

1) Using cluster logs for queries run in a notebook (Ref).

After enabling cluster logs, I can see table names in a text file "log4j-active" but this file changes whenever I start the cluster. Seems that this file is archived as gz files but I cannot find table names in the archived files for most queries I had in the notebook.

2) Query history for queries run in the SQL persona (Ref).

We can see the query history in the Databricks user interface but I am looking for a way to send it to the Power BI, and extract table names.

I don't understand how API works and how to use API Specification file (Ref)

Afeez_Raheem
New Contributor III

@Mohammad Saberโ€‹ Many thanks for sharing your learning thus far! I appreciate it

If you wouldn't mind, we could both connect on a Teams call to look into this more intimately. I am pretty new to learning the tool

Mado
Valued Contributor II

@Owo Akiloโ€‹ 

All right, you can see my email in my profile.

Afeez_Raheem
New Contributor III

@Mohammad Saberโ€‹ thank you. I'd be sending you an email shortly

Afeez_Raheem
New Contributor III

@Mohammad Saberโ€‹ , sorry it appears you don't have your email there. Would you help confirm, please?

hoffman
New Contributor II
New Contributor II

You can use System Tables, now available in Unity Catalog metastore, to create the views you described. https://docs.databricks.com/en/admin/system-tables/index.html