Friday - last edited Friday
I’m working on a data usage use case and want to understand the right way to get read bytes and written bytes per table in Databricks, especially for Unity Catalog tables.
For each table, something like:
Date
Table name (catalog.schema.table)
Operation type (read/write)
User
Bytes read
Bytes written
Initially, I assumed I could get this directly from the UC audit logs via the system.access.audit system table, because that’s the “audit log” source. But while exploring, I realized a few things are missing
system.access.audit
Focused on governance and admin actions (e.g., grants, ownership changes, some access checks).
Great for “who did what” from a security perspective.
Does not expose per-table read bytes / written_bytes metrics.
Is there any direct way from system.access.audit to get per-table read/write bytes, or is audit intentionally scoped only to governance events (and not IO metrics)?
Friday
Actually, I was importing it from Azure Metrics and joining the folder with the location from information schema but maybe there is easier way
Sunday
Sunday
Here I explained how I enabled diagnostic settings https://medium.com/@databrickster/hidden-benefit-of-databricks-managed-tables-f9ff8e1801ac
Friday
System Audit tables are for account activity tracking & security.
For IO details
Sunday - last edited Sunday
system.access.audit focuses on governance and admin/security events. It doesn’t capture per-table I/O metrics such as read_bytes or written_bytes.
-- Per-table read/write bytes, allocated across sources/targets per statement
WITH lineage AS (
SELECT
event_date,
statement_id,
created_by,
source_table_full_name,
target_table_full_name,
source_type,
target_type
FROM system.access.table_lineage
WHERE event_date BETWEEN date_sub(current_date(), 30) AND current_date()
),
qh AS (
SELECT
statement_id,
executed_by,
CAST(start_time AS DATE) AS date,
read_bytes,
written_bytes,
execution_status
FROM system.query.history
WHERE execution_status = 'FINISHED'
)
, per_table AS (
SELECT
COALESCE(l.source_table_full_name, l.target_table_full_name) AS table_full_name,
CASE
WHEN l.source_type IS NOT NULL AND l.target_type IS NULL THEN 'read'
WHEN l.target_type IS NOT NULL AND l.source_type IS NULL THEN 'write'
WHEN l.source_type IS NOT NULL AND l.target_type IS NOT NULL THEN 'read_write'
ELSE 'unknown'
END AS operation_type,
qh.executed_by AS user,
qh.date AS date,
-- allocate bytes across all sources/targets in the statement
CASE
WHEN l.source_type IS NOT NULL THEN
qh.read_bytes / NULLIF(COUNT(l.source_table_full_name) OVER (PARTITION BY l.statement_id), 0)
ELSE 0
END AS bytes_read_alloc,
CASE
WHEN l.target_type IS NOT NULL THEN
qh.written_bytes / NULLIF(COUNT(l.target_table_full_name) OVER (PARTITION BY l.statement_id), 0)
ELSE 0
END AS bytes_written_alloc
FROM lineage l
JOIN qh ON l.statement_id = qh.statement_id
)
SELECT
date,
table_full_name AS table,
operation_type AS operation,
user,
SUM(bytes_read_alloc) AS bytes_read,
SUM(bytes_written_alloc) AS bytes_written
FROM per_table
GROUP BY date, table_full_name, operation_type, user
ORDER BY date DESC;