Databricks Community

WWoman · ‎02-26-2024

Hello,

I would like to know if there is direct access to the Databricks query history tables. For compliance issues, I would like to be able to create reports for something like: who has accessed a particular column in a table in the past 6 months. The query history web interface is quite limited. I would ideally like wo use SQL to query the history table. Is this possible?

WWoman · ‎03-15-2024

I would like to be able to query the query history tables by running my own queries. I do not want to use the Query History interface supplied by Databricks; I want to be able to create python scripts that access the underling tables/views for TAC and Query history. From your response, it seems like this is not possible. Can you confirm that?

josh_melton · ‎04-19-2024

%pip install databricks-sdk
dbutils.library.restartPython()

warehouse_id = "bc1af43449227761"
hours_back_to_check = 2

from databricks.sdk import WorkspaceClient
from databricks.sdk.service import sql
from pyspark.sql.types import StructType, StructField, StringType, LongType
import time

w = WorkspaceClient()
current_time_ms = int(round(time.time() * 1000))
start_time = current_time_ms - (3600000 * hours_back_to_check)

# Filtering options:
# https://databricks-sdk-py.readthedocs.io/en/latest/dbdataclasses/sql.html#databricks.sdk.service.sql.QueryFilter
query_filter = sql.QueryFilter(
    query_start_time_range=sql.TimeRange(start_time_ms=start_time, end_time_ms=current_time_ms),
    warehouse_ids=[warehouse_id]
)
query_ls = [query for query in w.query_history.list(filter_by=query_filter, include_metrics=True)]

schema = StructType([
    StructField("duration", LongType(), True),
    StructField("query_start_time_ms", LongType(), True),
    StructField("query_end_time_ms", LongType(), True),
    StructField("executed_as_user_name", StringType(), True),
    StructField("query_text", StringType(), True),
])

# Extract the relevant fields from the BaseRun objects
df = spark.createDataFrame([
    (query.duration, query.query_start_time_ms, query.query_end_time_ms, query.executed_as_user_name, query.query_text) 
    for query in query_ls
    ], schema)
df.display()

print([query.metrics for query in query_ls])

josh_melton · ‎04-19-2024

This provides query history. You may also be interested in System Tables - for compliance purposes check out Audit Logs:

https://docs.databricks.com/en/administration-guide/system-tables/audit-logs.html

josh_melton · ‎05-14-2024

For posterity, there is a query history system table which contains all of this information which is in preview at the time of me writing this. If you're reading this later than May 2024, please check the documentation for the query metrics table.

WWoman · ‎05-15-2024

A quick question on this... (First of all thanks so much for the sample code!). I'm playing around with this and I would like to get the statement_type and status. I see that duration, query_start_time_ms and query_end_time_ms are int data type and defines as LongType(); executed_as_user_name and query_text are str datatype and defined as StringType. statement_type and status are listed as data types QueryStatementType and QueryStatus respectively. How would I define the StructType for these fields?

josh_melton · ‎07-05-2024

Sorry, missed this - try the system table and query it in SQL instead, much simpler than defining StructTypes!

WWoman · ‎08-19-2024

The problem with that is that I do not have access to the raw system tables... I am writing a python script to load data into a table, so that we can run queries against it

ReiskaS · ‎05-15-2024

Thanks @josh_melton ! I was wondering right now about this (one day after your post!) since I only found the UI and API in the documentation and was really puzzled that there is no equivalent in unity to the Snowflake query_history table.

Databricks Community

Direct access to Databricks query history database

Connect with Databricks Users in Your Area

Join Us as a Community Technical Moderator

Databricks Community Champion - October 2024 - Filip Niziol

Become Our Next Monthly Community Champion!

Introducing Simple, Fast, and Scalable Batch LLM Inference on Mosaic AI Model Serving

Databricks Migration Strategy: Lessons Learned