cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Structured Streaming Event in Audit Logs

Hertz
New Contributor

I am trying to monitor when a table is created or updated using the audit logs. I have found that structured streaming writes/appends are not captured in the audit logs? Am I missing something shouldn't this be captured as a unity catalog event. Either under the createTable or updateTables action name? Is there another way to get this from the audit log?

1 REPLY 1

Kaniz
Community Manager
Community Manager

Hi @Hertz, Monitoring table creation and updates using audit logs is essential for maintaining data governance and security.

Letโ€™s explore this further.

Databricks, being a cloud-native platform, provides audit logs that allow administrators to track access to data and workspace resources. These logs capture various actions related to primary resources like clusters, jobs, and the workspace. However, youโ€™ve correctly observed that structured streaming writes/appends are not explicitly captured in the audit logs by default.

Here are some insights and recommendations:

  1. Audit Log Delivery:

    • Databricks delivers audit logs for all enabled workspaces in JSON format to a customer-owned AWS S3 bucket.
    • Each event related to actions (e.g., creating tables, updating data) is logged as a separate record.
    • Relevant parameters are stored in a sparse StructType called requestParams.
  2. ETL Process for Audit Logs:

    • To make audit log information more accessible, consider implementing an ETL process based on Structured Streaming and Delta Lake.
    • Structured Streaming benefits:
      • It manages state efficiently, ensuring that only newly added audit log files are processed.
      • You can design streaming queries as daily jobs (pseudo-batch jobs).
    • Delta Lake advantages:
      • Helps maintain the state of tables using write-ahead logs and checkpoints.
      • Simplifies handling of data changes.
  3. Customize Your Solution:

    • Extend the audit log processing logic to handle structured streaming writes/appends.
    • While Databricks audit logs cover many scenarios, specific use cases may require customizations.
    • Consider using change data capture (CDC) techniques to capture streaming events.
  4. Unity Catalog Events:

    • As of now, structured streaming writes/appends are not explicitly labelled as createTable or updateTables actions in the audit logs.
    • You might need to create custom logic to identify these events based on other available information.

Remember that audit logs are crucial for monitoring resource usage, identifying anti-patterns, and ensuring compliance. By combining Structured Streaming and Delta Lake, you can build a robust solution to track table creation and updates effectively.

For more detailed implementation steps, refer to the official Databricks blog post on monitoring your Databricks workspace with audit logs1.

Keep exploring and enhancing your monitoring capabilities! ๐Ÿš€

 
Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.