Databricks, being a cloud-native platform, provides audit logs that allow administrators to track access to data and workspace resources. These logs capture various actions related to primary resources like clusters, jobs, and the workspace. However, youโve correctly observed that structured streaming writes/appends are not explicitly captured in the audit logs by default.
Here are some insights and recommendations:
-
Audit Log Delivery:
- Databricks delivers audit logs for all enabled workspaces in JSON format to a customer-owned AWS S3 bucket.
- Each event related to actions (e.g., creating tables, updating data) is logged as a separate record.
- Relevant parameters are stored in a sparse StructType called
requestParams
.
-
ETL Process for Audit Logs:
- To make audit log information more accessible, consider implementing an ETL process based on Structured Streaming and Delta Lake.
- Structured Streaming benefits:
- It manages state efficiently, ensuring that only newly added audit log files are processed.
- You can design streaming queries as daily jobs (pseudo-batch jobs).
- Delta Lake advantages:
- Helps maintain the state of tables using write-ahead logs and checkpoints.
- Simplifies handling of data changes.
-
Customize Your Solution:
- Extend the audit log processing logic to handle structured streaming writes/appends.
- While Databricks audit logs cover many scenarios, specific use cases may require customizations.
- Consider using change data capture (CDC) techniques to capture streaming events.
-
Unity Catalog Events:
- As of now, structured streaming writes/appends are not explicitly labelled as createTable or updateTables actions in the audit logs.
- You might need to create custom logic to identify these events based on other available information.
Remember that audit logs are crucial for monitoring resource usage, identifying anti-patterns, and ensuring compliance. By combining Structured Streaming and Delta Lake, you can build a robust solution to track table creation and updates effectively.
For more detailed implementation steps, refer to the official Databricks blog post on monitoring your Databricks workspace with audit logs1.
Keep exploring and enhancing your monitoring capabilities! ๐