How to get read/write bytes per table using Databricks system tables?

Danish11052000
Contributor

I’m working on a data usage use case and want to understand the right way to get read bytes and written bytes per table in Databricks, especially for Unity Catalog tables.

What I want

For each table, something like:

  • Date

  • Table name (catalog.schema.table)

  • Operation type (read/write)

  • User

  • Bytes read

  • Bytes written

Initially, I assumed I could get this directly from the UC audit logs via the system.access.audit system table, because that’s the “audit log” source. But while exploring, I realized a few things are missing

  1. system.access.audit

    • Focused on governance and admin actions (e.g., grants, ownership changes, some access checks).

    • Great for “who did what” from a security perspective.

    • Does not expose per-table read bytes / written_bytes metrics.

Is there any direct way from system.access.audit to get per-table read/write bytes, or is audit intentionally scoped only to governance events (and not IO metrics)?

Hubert-Dudek
Databricks MVP

Actually, I was importing it from Azure Metrics and joining the folder with the location from information schema but maybe there is easier way


My blog: https://databrickster.medium.com/

Could you please provide some scripts or samples for that? It would be really helpful to understand.

Here I explained how I enabled diagnostic settings https://medium.com/@databrickster/hidden-benefit-of-databricks-managed-tables-f9ff8e1801ac


My blog: https://databrickster.medium.com/

View solution in original post

balajij8
Contributor III

System Audit tables are for account activity tracking & security.

For IO details

  • You can use query history & lineage. Add usage attribution code to get details at table level
  • You can use table logs to get the write info along with other details.

pradeep_singh
Contributor III

system.access.audit focuses on governance and admin/security events. It doesn’t capture per-table I/O metrics such as read_bytes or written_bytes.

  • Use system.query.history for per-statement I/O metrics (read_bytes, written_bytes, read_rows, written_rows, etc.).
  • Use system.access.table_lineage to identify which Unity Catalog tables were read or written. Join on statement_id where available. 
    See if below query works for your need . 
-- Per-table read/write bytes, allocated across sources/targets per statement
WITH lineage AS (
  SELECT
    event_date,
    statement_id,
    created_by,
    source_table_full_name,
    target_table_full_name,
    source_type,
    target_type
  FROM system.access.table_lineage
  WHERE event_date BETWEEN date_sub(current_date(), 30) AND current_date()
),
qh AS (
  SELECT
    statement_id,
    executed_by,
    CAST(start_time AS DATE) AS date,
    read_bytes,
    written_bytes,
    execution_status
  FROM system.query.history
  WHERE execution_status = 'FINISHED'
)
, per_table AS (
  SELECT
    COALESCE(l.source_table_full_name, l.target_table_full_name) AS table_full_name,
    CASE
      WHEN l.source_type IS NOT NULL AND l.target_type IS NULL THEN 'read'
      WHEN l.target_type IS NOT NULL AND l.source_type IS NULL THEN 'write'
      WHEN l.source_type IS NOT NULL AND l.target_type IS NOT NULL THEN 'read_write'
      ELSE 'unknown'
    END AS operation_type,
    qh.executed_by AS user,
    qh.date AS date,
    -- allocate bytes across all sources/targets in the statement
    CASE
      WHEN l.source_type IS NOT NULL THEN
        qh.read_bytes / NULLIF(COUNT(l.source_table_full_name) OVER (PARTITION BY l.statement_id), 0)
      ELSE 0
    END AS bytes_read_alloc,
    CASE
      WHEN l.target_type IS NOT NULL THEN
        qh.written_bytes / NULLIF(COUNT(l.target_table_full_name) OVER (PARTITION BY l.statement_id), 0)
      ELSE 0
    END AS bytes_written_alloc
  FROM lineage l
  JOIN qh ON l.statement_id = qh.statement_id
)
SELECT
  date,
  table_full_name AS table,
  operation_type AS operation,
  user,
  SUM(bytes_read_alloc) AS bytes_read,
  SUM(bytes_written_alloc) AS bytes_written
FROM per_table
GROUP BY date, table_full_name, operation_type, user
ORDER BY date DESC;

 

Thank You
Pradeep Singh - https://www.linkedin.com/in/dbxdev