cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How should I correctly extract the full table name from request_params in audit logs?

Danish11052000
Contributor

I’m trying to build a UC usage/refresh tracking table for every workspace. For each workspace, I want to know how many times a UC table was refreshed or accessed each month. To do this, I’m reading the Databricks audit logs and I need to extract only the correct full table name from the request_params field.

Sample Output Table (What I’m Trying to Build):

Just to show what I’m aiming for:

 
workspace_idworkspace_namemonth (event date)full_table_nameRefresh_count (count of distinct event date)
12345
ws-prod
2026-01-01
main.sales.orders
14
12345
ws-prod
2026-01-01
main.marketing.accounts
6
67890
ws-dev
2026-01-01
main.hr.employee_master
10

 

While parsing request_params, I’m seeing multiple different structures depending on the event type:

  • table_full_name
  • full_name_arg
  • securables (array with type and full_name)
  • separate fields like catalog, schema, and table

Because of this inconsistency, I’m not sure which one is the correct or recommended field to extract the full table name from. My goal is to capture a clean and reliable <catalog>.<schema>.<table> value for each refresh/access event.

My question:

What is the right or best‑practice approach to handle these different structures in request_params and determine which field should be treated as the authoritative full table name?

I want to avoid extracting the wrong name or parsing unnecessary fields, and I’d like to follow whatever Databricks considers the standard going forward.

Any guidance or recommendations would be appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @Danish11052000,

Is there a reason you prefer building your own table for this? I'm asking because there are simpler and more reliable patterns than hand-parsing.

If the account has system tables enabled, you can query system.access.audit directly instead of ingesting raw logs to your own table. It already exposes:

  • workspace_id – the workspace that triggered the event.
  • service_name, action_name, event_time, event_date - for filtering by UC events and time.
  • request_params (MAP<STRING,STRING>) - all the RPC parameters you are currently parsing.

System tables are available in the system catalog and include both system.access.audit (audit logs) and system.access.table_lineage (per-table read/write events).

If you can use these:

  • You don’t need to maintain your own ingestion pipeline.
  • You get 365 days of history out of the box (for audit + lineage).

You can still layer your monthly UC usage/refresh metrics on top with a simple SQL view or table.

As you pointed out, the challenge is that multiple keys are populated in the request_parms. You can try coalescing all of this into a single value. Example given below.

-- in system.access.audit
COALESCE(
  request_params.table_full_name,
  request_params.full_name_arg,
  request_params.securable_full_name,
  CONCAT(
    request_params.catalog_name, '.', 
    request_params.schema_name, '.', 
    request_params.name
  ),
  FROM_JSON(
    request_params.securables,
    'ARRAY<STRUCT<type: STRING, full_name: STRING>>'
  )[0].full_name
) AS table_full_name

So instead of “extracting the correct table name” differently per action_name, you can:

  1. Put this exact COALESCE expression in a view over system.access.audit (or your own logs table).
  2. Always reference table_full_name from that view.

Hope this helps!

If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.

Regards,

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***

View solution in original post

1 REPLY 1

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @Danish11052000,

Is there a reason you prefer building your own table for this? I'm asking because there are simpler and more reliable patterns than hand-parsing.

If the account has system tables enabled, you can query system.access.audit directly instead of ingesting raw logs to your own table. It already exposes:

  • workspace_id – the workspace that triggered the event.
  • service_name, action_name, event_time, event_date - for filtering by UC events and time.
  • request_params (MAP<STRING,STRING>) - all the RPC parameters you are currently parsing.

System tables are available in the system catalog and include both system.access.audit (audit logs) and system.access.table_lineage (per-table read/write events).

If you can use these:

  • You don’t need to maintain your own ingestion pipeline.
  • You get 365 days of history out of the box (for audit + lineage).

You can still layer your monthly UC usage/refresh metrics on top with a simple SQL view or table.

As you pointed out, the challenge is that multiple keys are populated in the request_parms. You can try coalescing all of this into a single value. Example given below.

-- in system.access.audit
COALESCE(
  request_params.table_full_name,
  request_params.full_name_arg,
  request_params.securable_full_name,
  CONCAT(
    request_params.catalog_name, '.', 
    request_params.schema_name, '.', 
    request_params.name
  ),
  FROM_JSON(
    request_params.securables,
    'ARRAY<STRUCT<type: STRING, full_name: STRING>>'
  )[0].full_name
) AS table_full_name

So instead of “extracting the correct table name” differently per action_name, you can:

  1. Put this exact COALESCE expression in a view over system.access.audit (or your own logs table).
  2. Always reference table_full_name from that view.

Hope this helps!

If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.

Regards,

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***