cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How to get read/write bytes per table using Databricks system tables?

Danish11052000
New Contributor III

Iโ€™m working on a data usage use case and want to understand the right way to get read bytes and written bytes per table in Databricks, especially for Unity Catalog tables.

What I want

For each table, something like:

  • Date

  • Table name (catalog.schema.table)

  • Operation type (read/write)

  • User

  • Bytes read

  • Bytes written

Initially, I assumed I could get this directly from the UC audit logs via the system.access.audit system table, because thatโ€™s the โ€œaudit logโ€ source. But while exploring, I realized a few things are missing

  1. system.access.audit

    • Focused on governance and admin actions (e.g., grants, ownership changes, some access checks).

    • Great for โ€œwho did whatโ€ from a security perspective.

    • Does not expose per-table read bytes / written_bytes metrics.

Is there any direct way from system.access.audit to get per-table read/write bytes, or is audit intentionally scoped only to governance events (and not IO metrics)?

5 REPLIES 5

Hubert-Dudek
Databricks MVP

Actually, I was importing it from Azure Metrics and joining the folder with the location from information schema but maybe there is easier way


My blog: https://databrickster.medium.com/

Could you please provide some scripts or samples for that? It would be really helpful to understand.

Here I explained how I enabled diagnostic settings https://medium.com/@databrickster/hidden-benefit-of-databricks-managed-tables-f9ff8e1801ac


My blog: https://databrickster.medium.com/

balajij8
New Contributor

System Audit tables are for account activity tracking & security.

For IO details

  • You can use query history & lineage. Add usage attribution code to get details at table level
  • You can use table logs to get the write info along with other details.

pradeep_singh
New Contributor II

system.access.audit focuses on governance and admin/security events. It doesnโ€™t capture per-table I/O metrics such as read_bytes or written_bytes.

  • Use system.query.history for per-statement I/O metrics (read_bytes, written_bytes, read_rows, written_rows, etc.).
  • Use system.access.table_lineage to identify which Unity Catalog tables were read or written. Join on statement_id where available. 
    See if below query works for your need . 
-- Per-table read/write bytes, allocated across sources/targets per statement
WITH lineage AS (
  SELECT
    event_date,
    statement_id,
    created_by,
    source_table_full_name,
    target_table_full_name,
    source_type,
    target_type
  FROM system.access.table_lineage
  WHERE event_date BETWEEN date_sub(current_date(), 30) AND current_date()
),
qh AS (
  SELECT
    statement_id,
    executed_by,
    CAST(start_time AS DATE) AS date,
    read_bytes,
    written_bytes,
    execution_status
  FROM system.query.history
  WHERE execution_status = 'FINISHED'
)
, per_table AS (
  SELECT
    COALESCE(l.source_table_full_name, l.target_table_full_name) AS table_full_name,
    CASE
      WHEN l.source_type IS NOT NULL AND l.target_type IS NULL THEN 'read'
      WHEN l.target_type IS NOT NULL AND l.source_type IS NULL THEN 'write'
      WHEN l.source_type IS NOT NULL AND l.target_type IS NOT NULL THEN 'read_write'
      ELSE 'unknown'
    END AS operation_type,
    qh.executed_by AS user,
    qh.date AS date,
    -- allocate bytes across all sources/targets in the statement
    CASE
      WHEN l.source_type IS NOT NULL THEN
        qh.read_bytes / NULLIF(COUNT(l.source_table_full_name) OVER (PARTITION BY l.statement_id), 0)
      ELSE 0
    END AS bytes_read_alloc,
    CASE
      WHEN l.target_type IS NOT NULL THEN
        qh.written_bytes / NULLIF(COUNT(l.target_table_full_name) OVER (PARTITION BY l.statement_id), 0)
      ELSE 0
    END AS bytes_written_alloc
  FROM lineage l
  JOIN qh ON l.statement_id = qh.statement_id
)
SELECT
  date,
  table_full_name AS table,
  operation_type AS operation,
  user,
  SUM(bytes_read_alloc) AS bytes_read,
  SUM(bytes_written_alloc) AS bytes_written
FROM per_table
GROUP BY date, table_full_name, operation_type, user
ORDER BY date DESC;