- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-03-2023 08:04 PM
I want to be able to view a listing of any or all of the following:
- When Notebooks were attached / detached to and from a DS&E cluster
- When Notebook code was executed on a DS&E cluster
- What Notebook specific cell code was executed on a DS&E cluster
Is this currently possible?
I have explored using the Cluster and Jobs/Runs APIs, however, these do not appear to address ad-hoc notebook executed code, but only jobs/workflows.
While it appears that the functionality that I'm after is available on Databricks SQL service warehouses, I need the same functionality for DS&E clusters.
The reason for this requirement is to determine what code and notebook triggered immediately preceding resize and expanded disk size events on a specific DS&E cluster.
Thanks.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-04-2023 11:02 AM
From the UI https://docs.databricks.com/notebooks/notebooks-code.html#version-control best way to check is version control.
BTW, do you see this helps https://www.databricks.com/blog/2022/11/02/monitoring-notebook-command-logs-static-analysis-tools.ht... @Cameron McPherson ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-03-2023 10:06 PM
Hi, Are you saying if you want to list it through the UI, then it is not currently available.
Please tag @Debayan with your next response which will notify me. Thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-03-2023 10:22 PM
@Debayan Mukherjee
Correct - some kind of API access would be good for this, eg the below code.
So, I would be able to construct a dataframe of all queries made against a specified cluster, or at least determine which cells / notebooks were attached to and executed on the cluster, as of specific date times.
from databricks_cli.sdk.api_client import ApiClient
from databricks_cli.<<module>>.api import <<ClusterHistoryAPI>>
from databricks_cli.clusters.api import ClusterApi
api_client = ApiClient(
host = DATABRICKS_HOST,
token = DATABRICKS_TOKEN
)
clusters_api = ClusterApi(api_client)
cluster_history_api = ClusterHistoryApi(api_client) # ie: this is API which provides history access to DS&E clusters
cluster_id = clusters_api.get_cluster_by_name('DataSciEng_Service_ClusterName').get('cluster_id')
cluster_code_exec_history = clusters_history_api.get_events(cluster_id, unix_start, unix_end,'ASC','',0,500).get('code_execution_history') # ie: history of all code segments / cells / notebooks executed on the specified DS&E cluster
df = spark.read.json(sc.parallelize(cluster_code_exec_history)) # profit
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-04-2023 11:02 AM
From the UI https://docs.databricks.com/notebooks/notebooks-code.html#version-control best way to check is version control.
BTW, do you see this helps https://www.databricks.com/blog/2022/11/02/monitoring-notebook-command-logs-static-analysis-tools.ht... @Cameron McPherson ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-04-2023 04:33 PM
@Atanu Sarkar Yes, your proposal will work - thank you.

