Databricks Community

rendorHaevyn · ‎04-03-2023

I want to be able to view a listing of any or all of the following:

When Notebooks were attached / detached to and from a DS&E cluster
When Notebook code was executed on a DS&E cluster
What Notebook specific cell code was executed on a DS&E cluster

Is this currently possible?

I have explored using the Cluster and Jobs/Runs APIs, however, these do not appear to address ad-hoc notebook executed code, but only jobs/workflows.

While it appears that the functionality that I'm after is available on Databricks SQL service warehouses, I need the same functionality for DS&E clusters.

The reason for this requirement is to determine what code and notebook triggered immediately preceding resize and expanded disk size events on a specific DS&E cluster.

Thanks.

Atanu · ‎04-04-2023

From the UI https://docs.databricks.com/notebooks/notebooks-code.html#version-control best way to check is version control.

BTW, do you see this helps https://www.databricks.com/blog/2022/11/02/monitoring-notebook-command-logs-static-analysis-tools.ht... @Cameron McPherson ?

View solution in original post

Debayan · ‎04-03-2023

Hi, Are you saying if you want to list it through the UI, then it is not currently available.

Please tag @Debayan with your next response which will notify me. Thank you!

rendorHaevyn · ‎04-03-2023

@Debayan Mukherjee

Correct - some kind of API access would be good for this, eg the below code.

So, I would be able to construct a dataframe of all queries made against a specified cluster, or at least determine which cells / notebooks were attached to and executed on the cluster, as of specific date times.

from databricks_cli.sdk.api_client import ApiClient
 
from databricks_cli.<<module>>.api import <<ClusterHistoryAPI>>
from databricks_cli.clusters.api import ClusterApi
 
api_client = ApiClient(
  host  = DATABRICKS_HOST,
  token = DATABRICKS_TOKEN
)
clusters_api = ClusterApi(api_client)
cluster_history_api = ClusterHistoryApi(api_client)  # ie: this is API which provides history access to DS&E clusters
 
cluster_id = clusters_api.get_cluster_by_name('DataSciEng_Service_ClusterName').get('cluster_id')
 
cluster_code_exec_history = clusters_history_api.get_events(cluster_id, unix_start, unix_end,'ASC','',0,500).get('code_execution_history')  # ie: history of all code segments / cells / notebooks executed on the specified DS&E cluster
 
df = spark.read.json(sc.parallelize(cluster_code_exec_history))  # profit