cancel
Showing results for 
Search instead for 
Did you mean: 
Warehousing & Analytics
cancel
Showing results for 
Search instead for 
Did you mean: 

How to get usage statistics from Databricks or SQL Databricks?

Mado
Valued Contributor II

Hi,

I am looking for a way to get usage statistics from Databricks (Data Science & Engineering and SQL persona).

For example:

  • I created a table. I want to know how many times a specific user queried that table.
  • How many times a pipeline was triggered?
  • How long it took to run a DLT pipeline?

Is there any way to get usage statistics?

1 ACCEPTED SOLUTION

Accepted Solutions

youssefmrini
Honored Contributor III
Honored Contributor III

You can get those type of information by activating verbose audit logs. https://docs.databricks.com/administration-guide/account-settings/audit-logs.html

It contains a lot important metrics that you can leverage to build dashboards.

View solution in original post

8 REPLIES 8

Anonymous
Not applicable

The Overwatch library may help. Unity Catalog also has some auditing to see who has accessed a table. For DLT, there are logs written to storage that you can read as a dataframe and parse for things like start and stop times and count how many starts to determine how often a pipeline is triggered.

youssefmrini
Honored Contributor III
Honored Contributor III

You can get those type of information by activating verbose audit logs. https://docs.databricks.com/administration-guide/account-settings/audit-logs.html

It contains a lot important metrics that you can leverage to build dashboards.

Afeez_Raheem
New Contributor III

@Mohammad Saber​ , I hope that you are well. I have the same request from my manager. I was wondering if you found a way around this request already. We could work together, if you have not found the solution just yet

Please let's connect 🙂

Mado
Valued Contributor II

@Owo Akilo​ 

What I have found so far.

If Workspace is enabled for Unity Catalog:

I set up Azure Log Analytics following the documentation. Now, I get the logs in Azure Log Analytics Workspace.

image 

There is a table named "DatabricksUnityCatalog" where I can find the action name "getTable". Here, I can see table names. But, for some reasons that I don't understand, Databricks doesn't save all table usage information in "DatabricksUnityCatalog".

I queried tables for a few days but I only see "getTable" action for one of the days.

If Workspace is not enabled for Unity Catalog:

What I found is:

1) Using cluster logs for queries run in a notebook (Ref).

After enabling cluster logs, I can see table names in a text file "log4j-active" but this file changes whenever I start the cluster. Seems that this file is archived as gz files but I cannot find table names in the archived files for most queries I had in the notebook.

2) Query history for queries run in the SQL persona (Ref).

We can see the query history in the Databricks user interface but I am looking for a way to send it to the Power BI, and extract table names.

I don't understand how API works and how to use API Specification file (Ref)

Afeez_Raheem
New Contributor III

@Mohammad Saber​ Many thanks for sharing your learning thus far! I appreciate it

If you wouldn't mind, we could both connect on a Teams call to look into this more intimately. I am pretty new to learning the tool

Mado
Valued Contributor II

@Owo Akilo​ 

All right, you can see my email in my profile.

Afeez_Raheem
New Contributor III

@Mohammad Saber​ thank you. I'd be sending you an email shortly

Afeez_Raheem
New Contributor III

@Mohammad Saber​ , sorry it appears you don't have your email there. Would you help confirm, please?

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.