cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Databricks: Report on SQL queries that are being executed

enichante
New Contributor

We have a SQL workspace with a cluster running that services a number of self service reports against a range of datasets. We want to be able to analyse and report on the queries our self service users are executing so we can get better visibility of who is using the data platform, and what/how the tables are being used. Ideally this would be using databricks SQL workspace to do this reporting rather than using another tool.

All this information is available in the UI in the Query history, but this is not in a form we can easily analyse or create graphs against

We know there is an API to pull the query history from the UI, however it does seem convoluted to query the API to fetch data about our cluster so we can ingest into our cluster so we can query it

What is the best way to get query history information information into a hive table so we can query, analyse and graph it?

1 ACCEPTED SOLUTION

Accepted Solutions

BilalAslamDbrx
Honored Contributor II
Honored Contributor II

@Werner Stinckens​ is right, the API is the way to go -- for now! We want to make this a better experience for you e.g. giving you a system table you can query directly without having to extract the data with an API and re-ingest it.

View solution in original post

4 REPLIES 4

-werners-
Esteemed Contributor III

The API is the way to go.

Chris_Grabiel
New Contributor III

Agree with @Werner Stinckens​ . We built a lake pipeline to feed that data via the API into lake storage (so we could keep more query history and combine that history "across" workspaces.

I've been filling warehouses and lakes for almost 15 years, and as such I've held all types of roles along my data engineering "journey." Have a question? Just ask!

BilalAslamDbrx
Honored Contributor II
Honored Contributor II

@Werner Stinckens​ is right, the API is the way to go -- for now! We want to make this a better experience for you e.g. giving you a system table you can query directly without having to extract the data with an API and re-ingest it.

Anonymous
Not applicable

Looks like the people have spoken: API is your best option! (thanks @Werner Stinckens​  @Chris Grabiel​  and @Bilal Aslam​ !)

@eni chante​ Let us know if you have questions about the API! If not, please mark one of the replies above as the "best answer"! That way we know the case is closed.

.....but also we would love to know what creative solutions you came up with via our API. Feel free to reply below, share the knowledge! Talk soon.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.