cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How should I correctly extract the full table name from request_params in audit logs?

Danish11052000
Contributor

’m trying to build a UC usage/refresh tracking table for every workspace. For each workspace, I want to know how many times a UC table was refreshed or accessed each month. To do this, I’m reading the Databricks audit logs and I need to extract only the correct full table name from the request_params field.

Sample Output Table (What I’m Trying to Build):

Just to show what I’m aiming for:

 
workspace_idworkspace_namemonth (event date)full_table_nameRefresh_count (count of distinct event date)
12345
ws-prod
2026-01-01
main.sales.orders
14
12345
ws-prod
2026-01-01
main.marketing.accounts
6
67890
ws-dev
2026-01-01
main.hr.employee_master
10

 

While parsing request_params, I’m seeing multiple different structures depending on the event type:

  • table_full_name
  • full_name_arg
  • securables (array with type and full_name)
  • separate fields like catalog, schema, and table

Because of this inconsistency, I’m not sure which one is the correct or recommended field to extract the full table name from. My goal is to capture a clean and reliable <catalog>.<schema>.<table> value for each refresh/access event.

My question:

What is the right or best‑practice approach to handle these different structures in request_params and determine which field should be treated as the authoritative full table name?

I want to avoid extracting the wrong name or parsing unnecessary fields, and I’d like to follow whatever Databricks considers the standard going forward.

Any guidance or recommendations would be appreciated.

3 ACCEPTED SOLUTIONS

Accepted Solutions

pradeep_singh
Contributor

Can you share the query/code you are running currently ?

Thank You
Pradeep Singh - https://www.linkedin.com/in/dbxdev

View solution in original post

Thanks @pradeep_singh for the quick help!
Sharing my understanding so others can benefit:

 Why I was getting inconsistent UC table names

In the Databricks system.access.audit logs, the table metadata can appear in multiple request_params fields, depending on the API/action used:

  • table_full_name – appears for most Delta/UC ops
  • full_arg_name – appears for some Catalog operations
  • Nested: catalog_name, schema_name, table_name – appears for granular resource access logs
  • table_url / url – appears for legacy paths or volume-like structures

Because these were populated differently across actions (getTable, listSummaries, updateTable, etc.), I saw inconsistent table names.

Correct approach: Use a prioritized COALESCE

To avoid missing any UC table, the right way is to extract the table name in priority order, falling back to the next field only if the previous one is NULL.

Example:

WITH workspace_details AS (
    SELECT
        'UC' as usage_type,
        a.workspace_id,
        w.workspace_name,
        COALESCE(
            a.request_params['full_arg_name'],          -- Priority 1
            a.request_params['table_full_name'],        -- Priority 2
            CONCAT(
                a.request_params['catalog_name'], '.',
                a.request_params['schema_name'], '.',
                a.request_params['table_name']
            ),                                          -- Priority 3
            a.request_params['table_url'],              -- Priority 4
            a.request_params['url']                     -- Priority 5 (fallback)
        ) AS full_table_name,
        a.event_date,
        COALESCE(a.request_params['table_url'], a.request_params['url']) AS path
    FROM system.access.audit a
    LEFT JOIN system.access.workspaces_latest w
        ON a.workspace_id = w.workspace_id
    WHERE event_date >= current_date() - {days}
)

Result

This approach ensures:

  • Consistent UC table identification
  • No more missing or mismatched table names across operations

Thanks again @pradeep_singh  — this solution works much better now.

View solution in original post

SteveOstrowski
Databricks Employee
Databricks Employee

Hi @Danish11052000,

You are on the right track with the COALESCE approach. The reason for the inconsistency is that different Unity Catalog action types populate different keys in request_params. Here is a breakdown of the key fields and which actions use them, plus a refined query.

WHICH REQUEST_PARAMS KEYS HOLD THE TABLE NAME

1. full_name_arg: This is the most common key for Unity Catalog operations like getTable, updateTable, deleteTable, and getTableSummaries. It contains the full three-part name (catalog.schema.table). The Databricks documentation for the system.access.audit table schema specifically shows full_name_arg as the example key.

2. table_full_name: Used by some DML/write-path events and internal operations.

3. Separate fields (catalog_name, schema_name, table_name): Used by certain granular operations like createTable where the parts are passed individually.

4. name: Some events use a generic "name" key that may hold the full three-part name.

Note: One small correction to your self-answer, the documented key name is full_name_arg (not full_arg_name). Worth double-checking in your environment, but the official docs reference full_name_arg.

RECOMMENDED QUERY

Here is a refined version of the COALESCE approach with filtering for Unity Catalog table events and the aggregation to build your target output:

SELECT
  a.workspace_id,
  w.workspace_name,
  DATE_TRUNC('month', a.event_date) AS month,
  COALESCE(
      a.request_params['full_name_arg'],
      a.request_params['table_full_name'],
      CASE
          WHEN a.request_params['catalog_name'] IS NOT NULL
              AND a.request_params['schema_name'] IS NOT NULL
              AND a.request_params['table_name'] IS NOT NULL
          THEN CONCAT(
              a.request_params['catalog_name'], '.',
              a.request_params['schema_name'], '.',
              a.request_params['table_name']
          )
      END,
      a.request_params['name']
  ) AS full_table_name,
  COUNT(DISTINCT a.event_date) AS refresh_count
FROM system.access.audit a
LEFT JOIN system.information_schema.catalog_tags ct
  ON 1=0  -- placeholder, see workspace join below
LEFT JOIN system.access.workspaces_latest w
  ON a.workspace_id = w.workspace_id
WHERE a.service_name = 'unityCatalog'
  AND a.action_name IN (
      'getTable', 'createTable', 'deleteTable', 'updateTable',
      'getTableSummaries', 'listTables'
  )
  AND a.event_date >= DATE_TRUNC('month', CURRENT_DATE()) - INTERVAL 6 MONTHS
  AND a.response.status_code = 200
GROUP BY 1, 2, 3, 4
HAVING full_table_name IS NOT NULL
  AND full_table_name LIKE '%.%.%'
ORDER BY 1, 3, 4

A few things this query does:

1. Filters to service_name = 'unityCatalog' so you only get UC table events, not notebook or cluster events that also have request_params.
2. Filters to successful responses (status_code = 200) so you are not counting failed lookups.
3. Uses a CASE expression for the CONCAT fallback so it only fires when all three parts are present, avoiding partial names like "catalog_name.null.null".
4. The HAVING clause with the LIKE '%.%.%' filter ensures you only keep valid three-part names.
5. Aggregates by month with COUNT(DISTINCT event_date) to get your refresh_count metric.

DISCOVERING WHICH KEYS ARE AVAILABLE

If you want to see exactly which request_params keys contain table-like values in your environment, this exploration query is useful:

SELECT
  action_name,
  MAP_KEYS(request_params) AS param_keys,
  request_params
FROM system.access.audit
WHERE service_name = 'unityCatalog'
  AND action_name IN ('getTable', 'createTable', 'updateTable', 'deleteTable')
  AND event_date >= CURRENT_DATE() - INTERVAL 7 DAYS
LIMIT 20

This will show you the actual keys present for each action type in your workspace, so you can verify the COALESCE priority order.

DOCUMENTATION REFERENCE

The system.access.audit table schema is documented here:
https://docs.databricks.com/en/admin/system-tables/audit-logs.html

The audit log event reference (which lists service names and action names):
https://docs.databricks.com/en/admin/account-settings/audit-logs.html

* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.

If this answer resolves your question, could you mark it as "Accept as Solution"? That helps other users quickly find the correct fix.

View solution in original post

5 REPLIES 5

pradeep_singh
Contributor

Can you share the query/code you are running currently ?

Thank You
Pradeep Singh - https://www.linkedin.com/in/dbxdev

Thanks @pradeep_singh for the quick help!
Sharing my understanding so others can benefit:

 Why I was getting inconsistent UC table names

In the Databricks system.access.audit logs, the table metadata can appear in multiple request_params fields, depending on the API/action used:

  • table_full_name – appears for most Delta/UC ops
  • full_arg_name – appears for some Catalog operations
  • Nested: catalog_name, schema_name, table_name – appears for granular resource access logs
  • table_url / url – appears for legacy paths or volume-like structures

Because these were populated differently across actions (getTable, listSummaries, updateTable, etc.), I saw inconsistent table names.

Correct approach: Use a prioritized COALESCE

To avoid missing any UC table, the right way is to extract the table name in priority order, falling back to the next field only if the previous one is NULL.

Example:

WITH workspace_details AS (
    SELECT
        'UC' as usage_type,
        a.workspace_id,
        w.workspace_name,
        COALESCE(
            a.request_params['full_arg_name'],          -- Priority 1
            a.request_params['table_full_name'],        -- Priority 2
            CONCAT(
                a.request_params['catalog_name'], '.',
                a.request_params['schema_name'], '.',
                a.request_params['table_name']
            ),                                          -- Priority 3
            a.request_params['table_url'],              -- Priority 4
            a.request_params['url']                     -- Priority 5 (fallback)
        ) AS full_table_name,
        a.event_date,
        COALESCE(a.request_params['table_url'], a.request_params['url']) AS path
    FROM system.access.audit a
    LEFT JOIN system.access.workspaces_latest w
        ON a.workspace_id = w.workspace_id
    WHERE event_date >= current_date() - {days}
)

Result

This approach ensures:

  • Consistent UC table identification
  • No more missing or mismatched table names across operations

Thanks again @pradeep_singh  — this solution works much better now.

pradeep_singh
Contributor

See if this helps - 

https://learn.microsoft.com/en-us/azure/databricks/admin/system-tables/data-classification#get-numbe...

 

Thank You
Pradeep Singh - https://www.linkedin.com/in/dbxdev

Hubert-Dudek
Databricks MVP

I would use delta history for that.


My blog: https://databrickster.medium.com/

SteveOstrowski
Databricks Employee
Databricks Employee

Hi @Danish11052000,

You are on the right track with the COALESCE approach. The reason for the inconsistency is that different Unity Catalog action types populate different keys in request_params. Here is a breakdown of the key fields and which actions use them, plus a refined query.

WHICH REQUEST_PARAMS KEYS HOLD THE TABLE NAME

1. full_name_arg: This is the most common key for Unity Catalog operations like getTable, updateTable, deleteTable, and getTableSummaries. It contains the full three-part name (catalog.schema.table). The Databricks documentation for the system.access.audit table schema specifically shows full_name_arg as the example key.

2. table_full_name: Used by some DML/write-path events and internal operations.

3. Separate fields (catalog_name, schema_name, table_name): Used by certain granular operations like createTable where the parts are passed individually.

4. name: Some events use a generic "name" key that may hold the full three-part name.

Note: One small correction to your self-answer, the documented key name is full_name_arg (not full_arg_name). Worth double-checking in your environment, but the official docs reference full_name_arg.

RECOMMENDED QUERY

Here is a refined version of the COALESCE approach with filtering for Unity Catalog table events and the aggregation to build your target output:

SELECT
  a.workspace_id,
  w.workspace_name,
  DATE_TRUNC('month', a.event_date) AS month,
  COALESCE(
      a.request_params['full_name_arg'],
      a.request_params['table_full_name'],
      CASE
          WHEN a.request_params['catalog_name'] IS NOT NULL
              AND a.request_params['schema_name'] IS NOT NULL
              AND a.request_params['table_name'] IS NOT NULL
          THEN CONCAT(
              a.request_params['catalog_name'], '.',
              a.request_params['schema_name'], '.',
              a.request_params['table_name']
          )
      END,
      a.request_params['name']
  ) AS full_table_name,
  COUNT(DISTINCT a.event_date) AS refresh_count
FROM system.access.audit a
LEFT JOIN system.information_schema.catalog_tags ct
  ON 1=0  -- placeholder, see workspace join below
LEFT JOIN system.access.workspaces_latest w
  ON a.workspace_id = w.workspace_id
WHERE a.service_name = 'unityCatalog'
  AND a.action_name IN (
      'getTable', 'createTable', 'deleteTable', 'updateTable',
      'getTableSummaries', 'listTables'
  )
  AND a.event_date >= DATE_TRUNC('month', CURRENT_DATE()) - INTERVAL 6 MONTHS
  AND a.response.status_code = 200
GROUP BY 1, 2, 3, 4
HAVING full_table_name IS NOT NULL
  AND full_table_name LIKE '%.%.%'
ORDER BY 1, 3, 4

A few things this query does:

1. Filters to service_name = 'unityCatalog' so you only get UC table events, not notebook or cluster events that also have request_params.
2. Filters to successful responses (status_code = 200) so you are not counting failed lookups.
3. Uses a CASE expression for the CONCAT fallback so it only fires when all three parts are present, avoiding partial names like "catalog_name.null.null".
4. The HAVING clause with the LIKE '%.%.%' filter ensures you only keep valid three-part names.
5. Aggregates by month with COUNT(DISTINCT event_date) to get your refresh_count metric.

DISCOVERING WHICH KEYS ARE AVAILABLE

If you want to see exactly which request_params keys contain table-like values in your environment, this exploration query is useful:

SELECT
  action_name,
  MAP_KEYS(request_params) AS param_keys,
  request_params
FROM system.access.audit
WHERE service_name = 'unityCatalog'
  AND action_name IN ('getTable', 'createTable', 'updateTable', 'deleteTable')
  AND event_date >= CURRENT_DATE() - INTERVAL 7 DAYS
LIMIT 20

This will show you the actual keys present for each action type in your workspace, so you can verify the COALESCE priority order.

DOCUMENTATION REFERENCE

The system.access.audit table schema is documented here:
https://docs.databricks.com/en/admin/system-tables/audit-logs.html

The audit log event reference (which lists service names and action names):
https://docs.databricks.com/en/admin/account-settings/audit-logs.html

* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.

If this answer resolves your question, could you mark it as "Accept as Solution"? That helps other users quickly find the correct fix.