Databricks Community

Danish11052000 · ‎02-22-2026

’m trying to build a UC usage/refresh tracking table for every workspace. For each workspace, I want to know how many times a UC table was refreshed or accessed each month. To do this, I’m reading the Databricks audit logs and I need to extract only the correct full table name from the request_params field.

Sample Output Table (What I’m Trying to Build):

Just to show what I’m aiming for:

workspace_id	workspace_name	month (event date)	full_table_name	Refresh_count (count of distinct event date)
12345	ws-prod	2026-01-01	main.sales.orders	14
12345	ws-prod	2026-01-01	main.marketing.accounts	6
67890	ws-dev	2026-01-01	main.hr.employee_master	10

While parsing request_params, I’m seeing multiple different structures depending on the event type:

table_full_name
full_name_arg
securables (array with type and full_name)
separate fields like catalog, schema, and table

Because of this inconsistency, I’m not sure which one is the correct or recommended field to extract the full table name from. My goal is to capture a clean and reliable <catalog>.<schema>.<table> value for each refresh/access event.

My question:

What is the right or best‑practice approach to handle these different structures in request_params and determine which field should be treated as the authoritative full table name?

I want to avoid extracting the wrong name or parsing unnecessary fields, and I’d like to follow whatever Databricks considers the standard going forward.

Any guidance or recommendations would be appreciated.

pradeep_singh · ‎02-25-2026

Can you share the query/code you are running currently ?

Thank You
Pradeep Singh - https://www.linkedin.com/in/dbxdev

View solution in original post

Danish11052000 · ‎03-01-2026

Thanks @pradeep_singh for the quick help!
Sharing my understanding so others can benefit:

Why I was getting inconsistent UC table names

In the Databricks system.access.audit logs, the table metadata can appear in multiple request_params fields, depending on the API/action used:

table_full_name – appears for most Delta/UC ops
full_arg_name – appears for some Catalog operations
Nested: catalog_name, schema_name, table_name – appears for granular resource access logs
table_url / url – appears for legacy paths or volume-like structures

Because these were populated differently across actions (getTable, listSummaries, updateTable, etc.), I saw inconsistent table names.

Correct approach: Use a prioritized COALESCE

To avoid missing any UC table, the right way is to extract the table name in priority order, falling back to the next field only if the previous one is NULL.

Example:

WITH workspace_details AS (

SELECT

'UC' as usage_type,

a.workspace_id,

w.workspace_name,

COALESCE(

a.request_params['full_arg_name'], -- Priority 1

a.request_params['table_full_name'], -- Priority 2

CONCAT(

a.request_params['catalog_name'], '.',

a.request_params['schema_name'], '.',

a.request_params['table_name']

), -- Priority 3

a.request_params['table_url'], -- Priority 4

a.request_params['url'] -- Priority 5 (fallback)

) AS full_table_name,

a.event_date,

COALESCE(a.request_params['table_url'], a.request_params['url']) AS path

FROM system.access.audit a

LEFT JOIN system.access.workspaces_latest w

ON a.workspace_id = w.workspace_id

WHERE event_date >= current_date() - {days}

)

Result

This approach ensures:

Consistent UC table identification
No more missing or mismatched table names across operations

Thanks again @pradeep_singh — this solution works much better now.

View solution in original post

SteveOstrowski · ‎03-08-2026

Hi @Danish11052000,

You are on the right track with the COALESCE approach. The reason for the inconsistency is that different Unity Catalog action types populate different keys in request_params. Here is a breakdown of the key fields and which actions use them, plus a refined query.

WHICH REQUEST_PARAMS KEYS HOLD THE TABLE NAME

1. full_name_arg: This is the most common key for Unity Catalog operations like getTable, updateTable, deleteTable, and getTableSummaries. It contains the full three-part name (catalog.schema.table). The Databricks documentation for the system.access.audit table schema specifically shows full_name_arg as the example key.

2. table_full_name: Used by some DML/write-path events and internal operations.

3. Separate fields (catalog_name, schema_name, table_name): Used by certain granular operations like createTable where the parts are passed individually.

4. name: Some events use a generic "name" key that may hold the full three-part name.

Note: One small correction to your self-answer, the documented key name is full_name_arg (not full_arg_name). Worth double-checking in your environment, but the official docs reference full_name_arg.

RECOMMENDED QUERY

Here is a refined version of the COALESCE approach with filtering for Unity Catalog table events and the aggregation to build your target output:

SELECT
  a.workspace_id,
  w.workspace_name,
  DATE_TRUNC('month', a.event_date) AS month,
  COALESCE(
      a.request_params['full_name_arg'],
      a.request_params['table_full_name'],
      CASE
          WHEN a.request_params['catalog_name'] IS NOT NULL
              AND a.request_params['schema_name'] IS NOT NULL
              AND a.request_params['table_name'] IS NOT NULL
          THEN CONCAT(
              a.request_params['catalog_name'], '.',
              a.request_params['schema_name'], '.',
              a.request_params['table_name']
          )
      END,
      a.request_params['name']
  ) AS full_table_name,
  COUNT(DISTINCT a.event_date) AS refresh_count
FROM system.access.audit a
LEFT JOIN system.information_schema.catalog_tags ct
  ON 1=0  -- placeholder, see workspace join below
LEFT JOIN system.access.workspaces_latest w
  ON a.workspace_id = w.workspace_id
WHERE a.service_name = 'unityCatalog'
  AND a.action_name IN (
      'getTable', 'createTable', 'deleteTable', 'updateTable',
      'getTableSummaries', 'listTables'
  )
  AND a.event_date >= DATE_TRUNC('month', CURRENT_DATE()) - INTERVAL 6 MONTHS
  AND a.response.status_code = 200
GROUP BY 1, 2, 3, 4
HAVING full_table_name IS NOT NULL
  AND full_table_name LIKE '%.%.%'
ORDER BY 1, 3, 4

A few things this query does:

1. Filters to service_name = 'unityCatalog' so you only get UC table events, not notebook or cluster events that also have request_params.
2. Filters to successful responses (status_code = 200) so you are not counting failed lookups.
3. Uses a CASE expression for the CONCAT fallback so it only fires when all three parts are present, avoiding partial names like "catalog_name.null.null".
4. The HAVING clause with the LIKE '%.%.%' filter ensures you only keep valid three-part names.
5. Aggregates by month with COUNT(DISTINCT event_date) to get your refresh_count metric.

DISCOVERING WHICH KEYS ARE AVAILABLE

If you want to see exactly which request_params keys contain table-like values in your environment, this exploration query is useful:

SELECT
  action_name,
  MAP_KEYS(request_params) AS param_keys,
  request_params
FROM system.access.audit
WHERE service_name = 'unityCatalog'
  AND action_name IN ('getTable', 'createTable', 'updateTable', 'deleteTable')
  AND event_date >= CURRENT_DATE() - INTERVAL 7 DAYS
LIMIT 20

This will show you the actual keys present for each action type in your workspace, so you can verify the COALESCE priority order.

DOCUMENTATION REFERENCE

The system.access.audit table schema is documented here:
https://docs.databricks.com/en/admin/system-tables/audit-logs.html

The audit log event reference (which lists service names and action names):
https://docs.databricks.com/en/admin/account-settings/audit-logs.html

* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.

If this answer resolves your question, could you mark it as "Accept as Solution"? That helps other users quickly find the correct fix.

View solution in original post

pradeep_singh · ‎02-25-2026

Can you share the query/code you are running currently ?

Thank You
Pradeep Singh - https://www.linkedin.com/in/dbxdev

Danish11052000 · ‎03-01-2026

Thanks @pradeep_singh for the quick help!
Sharing my understanding so others can benefit:

Why I was getting inconsistent UC table names

In the Databricks system.access.audit logs, the table metadata can appear in multiple request_params fields, depending on the API/action used:

table_full_name – appears for most Delta/UC ops
full_arg_name – appears for some Catalog operations
Nested: catalog_name, schema_name, table_name – appears for granular resource access logs
table_url / url – appears for legacy paths or volume-like structures

Because these were populated differently across actions (getTable, listSummaries, updateTable, etc.), I saw inconsistent table names.

Correct approach: Use a prioritized COALESCE

To avoid missing any UC table, the right way is to extract the table name in priority order, falling back to the next field only if the previous one is NULL.

Example:

WITH workspace_details AS (

SELECT

'UC' as usage_type,

a.workspace_id,

w.workspace_name,

COALESCE(

a.request_params['full_arg_name'], -- Priority 1

a.request_params['table_full_name'], -- Priority 2

CONCAT(

a.request_params['catalog_name'], '.',

a.request_params['schema_name'], '.',

a.request_params['table_name']

), -- Priority 3

a.request_params['table_url'], -- Priority 4

a.request_params['url'] -- Priority 5 (fallback)

) AS full_table_name,

a.event_date,

COALESCE(a.request_params['table_url'], a.request_params['url']) AS path

FROM system.access.audit a

LEFT JOIN system.access.workspaces_latest w

ON a.workspace_id = w.workspace_id

WHERE event_date >= current_date() - {days}

)

Result

This approach ensures:

Consistent UC table identification
No more missing or mismatched table names across operations

Thanks again @pradeep_singh — this solution works much better now.

pradeep_singh · ‎02-25-2026

See if this helps -

https://learn.microsoft.com/en-us/azure/databricks/admin/system-tables/data-classification#get-numbe...

Thank You
Pradeep Singh - https://www.linkedin.com/in/dbxdev

Hubert-Dudek · ‎02-26-2026

I would use delta history for that.

My blog: https://databrickster.medium.com/

SteveOstrowski · ‎03-08-2026

Hi @Danish11052000,

You are on the right track with the COALESCE approach. The reason for the inconsistency is that different Unity Catalog action types populate different keys in request_params. Here is a breakdown of the key fields and which actions use them, plus a refined query.

WHICH REQUEST_PARAMS KEYS HOLD THE TABLE NAME

1. full_name_arg: This is the most common key for Unity Catalog operations like getTable, updateTable, deleteTable, and getTableSummaries. It contains the full three-part name (catalog.schema.table). The Databricks documentation for the system.access.audit table schema specifically shows full_name_arg as the example key.

2. table_full_name: Used by some DML/write-path events and internal operations.

3. Separate fields (catalog_name, schema_name, table_name): Used by certain granular operations like createTable where the parts are passed individually.

4. name: Some events use a generic "name" key that may hold the full three-part name.

Note: One small correction to your self-answer, the documented key name is full_name_arg (not full_arg_name). Worth double-checking in your environment, but the official docs reference full_name_arg.

RECOMMENDED QUERY

Here is a refined version of the COALESCE approach with filtering for Unity Catalog table events and the aggregation to build your target output:

SELECT
  a.workspace_id,
  w.workspace_name,
  DATE_TRUNC('month', a.event_date) AS month,
  COALESCE(
      a.request_params['full_name_arg'],
      a.request_params['table_full_name'],
      CASE
          WHEN a.request_params['catalog_name'] IS NOT NULL
              AND a.request_params['schema_name'] IS NOT NULL
              AND a.request_params['table_name'] IS NOT NULL
          THEN CONCAT(
              a.request_params['catalog_name'], '.',
              a.request_params['schema_name'], '.',
              a.request_params['table_name']
          )
      END,
      a.request_params['name']
  ) AS full_table_name,
  COUNT(DISTINCT a.event_date) AS refresh_count
FROM system.access.audit a
LEFT JOIN system.information_schema.catalog_tags ct
  ON 1=0  -- placeholder, see workspace join below
LEFT JOIN system.access.workspaces_latest w
  ON a.workspace_id = w.workspace_id
WHERE a.service_name = 'unityCatalog'
  AND a.action_name IN (
      'getTable', 'createTable', 'deleteTable', 'updateTable',
      'getTableSummaries', 'listTables'
  )
  AND a.event_date >= DATE_TRUNC('month', CURRENT_DATE()) - INTERVAL 6 MONTHS
  AND a.response.status_code = 200
GROUP BY 1, 2, 3, 4
HAVING full_table_name IS NOT NULL
  AND full_table_name LIKE '%.%.%'
ORDER BY 1, 3, 4

A few things this query does:

1. Filters to service_name = 'unityCatalog' so you only get UC table events, not notebook or cluster events that also have request_params.
2. Filters to successful responses (status_code = 200) so you are not counting failed lookups.
3. Uses a CASE expression for the CONCAT fallback so it only fires when all three parts are present, avoiding partial names like "catalog_name.null.null".
4. The HAVING clause with the LIKE '%.%.%' filter ensures you only keep valid three-part names.
5. Aggregates by month with COUNT(DISTINCT event_date) to get your refresh_count metric.

DISCOVERING WHICH KEYS ARE AVAILABLE

If you want to see exactly which request_params keys contain table-like values in your environment, this exploration query is useful:

SELECT
  action_name,
  MAP_KEYS(request_params) AS param_keys,
  request_params
FROM system.access.audit
WHERE service_name = 'unityCatalog'
  AND action_name IN ('getTable', 'createTable', 'updateTable', 'deleteTable')
  AND event_date >= CURRENT_DATE() - INTERVAL 7 DAYS
LIMIT 20

This will show you the actual keys present for each action type in your workspace, so you can verify the COALESCE priority order.

DOCUMENTATION REFERENCE

The system.access.audit table schema is documented here:
https://docs.databricks.com/en/admin/system-tables/audit-logs.html

The audit log event reference (which lists service names and action names):
https://docs.databricks.com/en/admin/account-settings/audit-logs.html

* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.

If this answer resolves your question, could you mark it as "Accept as Solution"? That helps other users quickly find the correct fix.

Danish11052000 · 2 weeks ago

Hi @SteveOstrowski . Thanks for explaining.. but getTable does not mean the user read or accessed the actual data.

What getTable actually means

It's a metadata API call — it fetches the table definition/properties, not the data itself.

is there a alternative way to fetch the details that user has access directly on the actual data to read I am looking specifically for?

SteveOstrowski · 2 weeks ago

Hi Danish, good push-back. You are right that getTable in the Unity Catalog audit logs is metadata-only: it fires when a client resolves a table reference or inspects its schema/properties, not when rows are actually scanned.

If you want a signal that a principal actually read data, the cleanest sources in system.access.audit are:

generateTemporaryTableCredential on the unityCatalog service. Unity Catalog issues a short-lived credential to read the underlying files whenever an engine is about to actually access table data (external readers, Delta Sharing recipients, serverless query engines, Lakehouse Federation, etc.). Filter on action_name = "generateTemporaryTableCredential" and pull request_params.table_full_name / request_params.operation (READ vs READ_WRITE). This is the strongest "the bytes were handed out" signal.
commandSubmit / commandFinish (clusters) and executeAdhocQuery / executeSavedQuery / queryStart (Databricks SQL warehouses). These include the actual commandText / statement, so you can confirm a SELECT against the table in question.
For Delta Sharing reads, look for deltaSharingQueriedTable and deltaSharingQueriedTableChanges.

The pragmatic pattern most folks land on: combine generateTemporaryTableCredential (the "credential was vended for this table" signal) with the query-execution events (the "this is the SQL that ran" signal), and join on request_id or by user_identity.email + time window. That gives you both who got read access and what they ran.

If you also want a higher-level view without parsing SQL, system.access.table_lineage records actual read/write lineage per user per table per source, which is often easier to query for "did user X read table Y in the last 30 days".

One caveat: external-engine reads via Iceberg REST Catalog or credential vending will show up as generateTemporaryTableCredential but you will not get the executed SQL, since the query ran outside Databricks. That is expected and is usually the best you can do for those paths.

Sources: