Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
Is there a way to make audit on all tables in hive_metastore (no UC), all are external, to check when each has been used for the last time (queried / updated / etc). ?
Apache Ranger or Apache Sentry can be used for auditing Hive activities. If you have set up auditing in one of these tools, you can review the audit logs to see when tables were accessed. Audit logs are typically stored in a separate location, and you'll need to refer to the documentation of the specific tool you are using for more details. You can modify your Hive queries or scripts to log information about table access to a custom log file. This would involve adding logging statements in your Hive scripts or applications.
Thank you for the suggestions, will check both. However for the hive scripts, it's near impossible, as tables are queried ad-hoc from notebooks/ created ad-hoc. but noone is doing any cleanup and I feel audit is very needed ๐
Connect with Databricks Users in Your Area
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.