cancel
Showing results for 
Search instead for 
Did you mean: 
Data Governance
Join discussions on data governance practices, compliance, and security within the Databricks Community. Exchange strategies and insights to ensure data integrity and regulatory compliance.
cancel
Showing results for 
Search instead for 
Did you mean: 

Databricks Cluster Logs, Where can I find table usage information or queries?

Mado
Valued Contributor II

Hi,

I am trying to find queries I run in a notebook (running on a cluster) in Cluster Logs.

I set the cluster to deliver logs to a folder on DBFS and I can read log files from there.

I created a Databricks workspace on the premium pricing tier and it is not enabled for the Unity Catalogue. Tables are stored in hive_metastore based on my client's request.

I queried tables on specific days but I cannot find table names in the log files for those specific actions and timestamp.

I can see table names in log files "log4j" but seems that these are related to when I created tables (based on the timestamp).

What I understand is that "log4j-active.log" contains logs of the currently running cluster or the most recent logs. From time to time, Databricks archives the logs in separate gz files with the filename “log4j-Date-log.gz“. For example: “log4j-2023-02-22-10.log.gz”.

d580c18e-59f2-4291-8b51-9a4c8cb0ea0a 

Screenshot 

Please let me know where I can find information about table usage or queries (if there are any).

Also, note that I don't get logs for all activities. For example, I started the cluster on Feb 24th and run a few queries. But, there is no "log4j" file for that time even if I consider the GMT time zone.

1 ACCEPTED SOLUTION

Accepted Solutions

Anonymous
Not applicable

@Mohammad Saber​ :

If you are not seeing the expected logs in the log files, it's possible that either logging was not properly configured or the logs have been rotated out of the active log file and into an archive file. Here are some suggestions you can try to locate the logs for the queries you ran:

  1. Check if logging was properly configured: Ensure that logging was properly configured when you set up the cluster. You can check the cluster's logging configuration by going to the cluster configuration page and looking under the "Advanced Options" section.
  2. Check the archive files: As you noted, logs are periodically archived and stored in separate gzipped files. Check the archive files for the relevant time period to see if they contain the logs you are looking for. You can access the archive files by navigating to the DBFS folder where the logs are stored and searching for files with names like "log4j-YYYY-MM-DD-HH.log.gz". Note that you may need to unzip the files to view their contents.
  3. Check the audit logs: Databricks provides an audit logging feature that records all user activity in the workspace. This includes information such as who ran a query, when it was run, and which tables were accessed. You can access the audit logs by navigating to the "Audit Logs" section of the workspace.
  4. Check the metastore logs: If your tables are stored in Hive Metastore, you may be able to find information about table usage and queries in the metastore logs. These logs can typically be found in the same directory as the metastore database, and may contain information about table creation, modification, and usage.

If none of these suggestions help you locate the logs you are looking for, you may need to consult with the Databricks support team for further assistance.

View solution in original post

1 REPLY 1

Anonymous
Not applicable

@Mohammad Saber​ :

If you are not seeing the expected logs in the log files, it's possible that either logging was not properly configured or the logs have been rotated out of the active log file and into an archive file. Here are some suggestions you can try to locate the logs for the queries you ran:

  1. Check if logging was properly configured: Ensure that logging was properly configured when you set up the cluster. You can check the cluster's logging configuration by going to the cluster configuration page and looking under the "Advanced Options" section.
  2. Check the archive files: As you noted, logs are periodically archived and stored in separate gzipped files. Check the archive files for the relevant time period to see if they contain the logs you are looking for. You can access the archive files by navigating to the DBFS folder where the logs are stored and searching for files with names like "log4j-YYYY-MM-DD-HH.log.gz". Note that you may need to unzip the files to view their contents.
  3. Check the audit logs: Databricks provides an audit logging feature that records all user activity in the workspace. This includes information such as who ran a query, when it was run, and which tables were accessed. You can access the audit logs by navigating to the "Audit Logs" section of the workspace.
  4. Check the metastore logs: If your tables are stored in Hive Metastore, you may be able to find information about table usage and queries in the metastore logs. These logs can typically be found in the same directory as the metastore database, and may contain information about table creation, modification, and usage.

If none of these suggestions help you locate the logs you are looking for, you may need to consult with the Databricks support team for further assistance.

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!