cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

History of code executed on Data Science & Engineering service clusters

rendorHaevyn
New Contributor III

I want to be able to view a listing of any or all of the following:

  • When Notebooks were attached / detached to and from a DS&E cluster
  • When Notebook code was executed on a DS&E cluster
  • What Notebook specific cell code was executed on a DS&E cluster

Is this currently possible?

I have explored using the Cluster and Jobs/Runs APIs, however, these do not appear to address ad-hoc notebook executed code, but only jobs/workflows. 

While it appears that the functionality that I'm after is available on Databricks SQL service warehouses, I need the same functionality for DS&E clusters.

The reason for this requirement is to determine what code and notebook triggered immediately preceding resize and expanded disk size events on a specific DS&E cluster.

Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions

Atanu
Esteemed Contributor
4 REPLIES 4

Debayan
Esteemed Contributor III

Hi, Are you saying if you want to list it through the UI, then it is not currently available.

Please tag @Debayan​ with your next response which will notify me. Thank you!

rendorHaevyn
New Contributor III

@Debayan Mukherjee​ 

Correct - some kind of API access would be good for this, eg the below code.

So, I would be able to construct a dataframe of all queries made against a specified cluster, or at least determine which cells / notebooks were attached to and executed on the cluster, as of specific date times.

from databricks_cli.sdk.api_client import ApiClient
 
from databricks_cli.<<module>>.api import <<ClusterHistoryAPI>>
from databricks_cli.clusters.api import ClusterApi
 
api_client = ApiClient(
  host  = DATABRICKS_HOST,
  token = DATABRICKS_TOKEN
)
clusters_api = ClusterApi(api_client)
cluster_history_api = ClusterHistoryApi(api_client)  # ie: this is API which provides history access to DS&E clusters
 
cluster_id = clusters_api.get_cluster_by_name('DataSciEng_Service_ClusterName').get('cluster_id')
 
cluster_code_exec_history = clusters_history_api.get_events(cluster_id, unix_start, unix_end,'ASC','',0,500).get('code_execution_history')  # ie: history of all code segments / cells / notebooks executed on the specified DS&E cluster
 
df = spark.read.json(sc.parallelize(cluster_code_exec_history))  # profit

Atanu
Esteemed Contributor

rendorHaevyn
New Contributor III

@Atanu Sarkar​  Yes, your proposal will work - thank you.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group