cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

History of code executed on Data Science & Engineering service clusters

rendorHaevyn
New Contributor III

I want to be able to view a listing of any or all of the following:

  • When Notebooks were attached / detached to and from a DS&E cluster
  • When Notebook code was executed on a DS&E cluster
  • What Notebook specific cell code was executed on a DS&E cluster

Is this currently possible?

I have explored using the Cluster and Jobs/Runs APIs, however, these do not appear to address ad-hoc notebook executed code, but only jobs/workflows. 

While it appears that the functionality that I'm after is available on Databricks SQL service warehouses, I need the same functionality for DS&E clusters.

The reason for this requirement is to determine what code and notebook triggered immediately preceding resize and expanded disk size events on a specific DS&E cluster.

Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions

Atanu
Esteemed Contributor
Esteemed Contributor
4 REPLIES 4

Debayan
Esteemed Contributor III
Esteemed Contributor III

Hi, Are you saying if you want to list it through the UI, then it is not currently available.

Please tag @Debayan​ with your next response which will notify me. Thank you!

rendorHaevyn
New Contributor III

@Debayan Mukherjee​ 

Correct - some kind of API access would be good for this, eg the below code.

So, I would be able to construct a dataframe of all queries made against a specified cluster, or at least determine which cells / notebooks were attached to and executed on the cluster, as of specific date times.

from databricks_cli.sdk.api_client import ApiClient
 
from databricks_cli.<<module>>.api import <<ClusterHistoryAPI>>
from databricks_cli.clusters.api import ClusterApi
 
api_client = ApiClient(
  host  = DATABRICKS_HOST,
  token = DATABRICKS_TOKEN
)
clusters_api = ClusterApi(api_client)
cluster_history_api = ClusterHistoryApi(api_client)  # ie: this is API which provides history access to DS&E clusters
 
cluster_id = clusters_api.get_cluster_by_name('DataSciEng_Service_ClusterName').get('cluster_id')
 
cluster_code_exec_history = clusters_history_api.get_events(cluster_id, unix_start, unix_end,'ASC','',0,500).get('code_execution_history')  # ie: history of all code segments / cells / notebooks executed on the specified DS&E cluster
 
df = spark.read.json(sc.parallelize(cluster_code_exec_history))  # profit

Atanu
Esteemed Contributor
Esteemed Contributor

rendorHaevyn
New Contributor III

@Atanu Sarkar​  Yes, your proposal will work - thank you.

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!