cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Governance
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Direct access to Databricks query history database

WWoman
New Contributor II

Hello,

I would like to know if there is direct access to the Databricks query history tables. For compliance issues, I would like to be able to create reports  for something like: who has accessed a particular column in a table in the past 6 months. The query history web interface is quite limited.  I would ideally like wo use SQL to query the history table. Is this possible?

 

4 REPLIES 4

Kaniz
Community Manager
Community Manager

Hi @WWomanTo address your compliance requirements and track who has accessed specific columns in a table over the past 6 months, you can follow these steps:

  1. Databricks Table Access Control (TAC):
  • Databricks provides an audit and control feature called Table Access Control (TAC). TAC allows you to monitor and manage access to tables within Databricks.
  • By enabling TAC, you can audit table access and view logs related to table usage. These logs are stored in the โ€œDatabricksTableAccessControlโ€ log table.
  • To set up TAC, follow these steps:
  1. Query History:
    • Databricks also maintains a query history that records executed SQL queries.
    • To access the query history:

Remember that these features provide valuable insights into user activity, but they should be used responsibly and in accordance with your organizationโ€™s policies.

WWoman
New Contributor II

I would like to be able to query the query history tables by running my own queries. I do not want to use the Query History interface supplied by Databricks;  I want to be able to create python scripts that access the underling tables/views for TAC and Query history.   From your response, it seems like this is not possible. Can you confirm that?

josh_melton
New Contributor III
New Contributor III
%pip install databricks-sdk
dbutils.library.restartPython()
warehouse_id = "bc1af43449227761"
hours_back_to_check = 2
from databricks.sdk import WorkspaceClient
from databricks.sdk.service import sql
from pyspark.sql.types import StructType, StructField, StringType, LongType
import time

w = WorkspaceClient()
current_time_ms = int(round(time.time() * 1000))
start_time = current_time_ms - (3600000 * hours_back_to_check)
# Filtering options:
# https://databricks-sdk-py.readthedocs.io/en/latest/dbdataclasses/sql.html#databricks.sdk.service.sql.QueryFilter
query_filter = sql.QueryFilter(
    query_start_time_range=sql.TimeRange(start_time_ms=start_time, end_time_ms=current_time_ms),
    warehouse_ids=[warehouse_id]
)
query_ls = [query for query in w.query_history.list(filter_by=query_filter, include_metrics=True)]
schema = StructType([
    StructField("duration", LongType(), True),
    StructField("query_start_time_ms", LongType(), True),
    StructField("query_end_time_ms", LongType(), True),
    StructField("executed_as_user_name", StringType(), True),
    StructField("query_text", StringType(), True),
])

# Extract the relevant fields from the BaseRun objects
df = spark.createDataFrame([
    (query.duration, query.query_start_time_ms, query.query_end_time_ms, query.executed_as_user_name, query.query_text) 
    for query in query_ls
    ], schema)
df.display()
print([query.metrics for query in query_ls])

This provides query history. You may also be interested in System Tables - for compliance purposes check out Audit Logs:

https://docs.databricks.com/en/administration-guide/system-tables/audit-logs.html

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.