cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
cancel
Showing results for 
Search instead for 
Did you mean: 

History of code executed on Data Science & Engineering service clusters

rendorHaevyn
New Contributor III

I want to be able to view a listing of any or all of the following:

  • When Notebooks were attached / detached to and from a DS&E cluster
  • When Notebook code was executed on a DS&E cluster
  • What Notebook specific cell code was executed on a DS&E cluster

Is this currently possible?

I have explored using the Cluster and Jobs/Runs APIs, however, these do not appear to address ad-hoc notebook executed code, but only jobs/workflows. 

While it appears that the functionality that I'm after is available on Databricks SQL service warehouses, I need the same functionality for DS&E clusters.

The reason for this requirement is to determine what code and notebook triggered immediately preceding resize and expanded disk size events on a specific DS&E cluster.

Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
4 REPLIES 4

Debayan
Esteemed Contributor III
Esteemed Contributor III

Hi, Are you saying if you want to list it through the UI, then it is not currently available.

Please tag @Debayan​ with your next response which will notify me. Thank you!

rendorHaevyn
New Contributor III

@Debayan Mukherjee​ 

Correct - some kind of API access would be good for this, eg the below code.

So, I would be able to construct a dataframe of all queries made against a specified cluster, or at least determine which cells / notebooks were attached to and executed on the cluster, as of specific date times.

from databricks_cli.sdk.api_client import ApiClient
 
from databricks_cli.<<module>>.api import <<ClusterHistoryAPI>>
from databricks_cli.clusters.api import ClusterApi
 
api_client = ApiClient(
  host  = DATABRICKS_HOST,
  token = DATABRICKS_TOKEN
)
clusters_api = ClusterApi(api_client)
cluster_history_api = ClusterHistoryApi(api_client)  # ie: this is API which provides history access to DS&E clusters
 
cluster_id = clusters_api.get_cluster_by_name('DataSciEng_Service_ClusterName').get('cluster_id')
 
cluster_code_exec_history = clusters_history_api.get_events(cluster_id, unix_start, unix_end,'ASC','',0,500).get('code_execution_history')  # ie: history of all code segments / cells / notebooks executed on the specified DS&E cluster
 
df = spark.read.json(sc.parallelize(cluster_code_exec_history))  # profit

Atanu
Esteemed Contributor
Esteemed Contributor

rendorHaevyn
New Contributor III

@Atanu Sarkar​  Yes, your proposal will work - thank you.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.