History of code executed on Data Science & Engineering service clusters

rendorHaevyn
New Contributor III

I want to be able to view a listing of any or all of the following:

  • When Notebooks were attached / detached to and from a DS&E cluster
  • When Notebook code was executed on a DS&E cluster
  • What Notebook specific cell code was executed on a DS&E cluster

Is this currently possible?

I have explored using the Cluster and Jobs/Runs APIs, however, these do not appear to address ad-hoc notebook executed code, but only jobs/workflows. 

While it appears that the functionality that I'm after is available on Databricks SQL service warehouses, I need the same functionality for DS&E clusters.

The reason for this requirement is to determine what code and notebook triggered immediately preceding resize and expanded disk size events on a specific DS&E cluster.

Thanks.

Debayan
Databricks Employee
Databricks Employee

Hi, Are you saying if you want to list it through the UI, then it is not currently available.

Please tag @Debayan​ with your next response which will notify me. Thank you!

rendorHaevyn
New Contributor III

@Debayan Mukherjee​ 

Correct - some kind of API access would be good for this, eg the below code.

So, I would be able to construct a dataframe of all queries made against a specified cluster, or at least determine which cells / notebooks were attached to and executed on the cluster, as of specific date times.

from databricks_cli.sdk.api_client import ApiClient
 
from databricks_cli.<<module>>.api import <<ClusterHistoryAPI>>
from databricks_cli.clusters.api import ClusterApi
 
api_client = ApiClient(
  host  = DATABRICKS_HOST,
  token = DATABRICKS_TOKEN
)
clusters_api = ClusterApi(api_client)
cluster_history_api = ClusterHistoryApi(api_client)  # ie: this is API which provides history access to DS&E clusters
 
cluster_id = clusters_api.get_cluster_by_name('DataSciEng_Service_ClusterName').get('cluster_id')
 
cluster_code_exec_history = clusters_history_api.get_events(cluster_id, unix_start, unix_end,'ASC','',0,500).get('code_execution_history')  # ie: history of all code segments / cells / notebooks executed on the specified DS&E cluster
 
df = spark.read.json(sc.parallelize(cluster_code_exec_history))  # profit

Atanu
Databricks Employee
Databricks Employee

rendorHaevyn
New Contributor III

@Atanu Sarkar​  Yes, your proposal will work - thank you.