04-03-2023 08:04 PM
I want to be able to view a listing of any or all of the following:
Is this currently possible?
I have explored using the Cluster and Jobs/Runs APIs, however, these do not appear to address ad-hoc notebook executed code, but only jobs/workflows.
While it appears that the functionality that I'm after is available on Databricks SQL service warehouses, I need the same functionality for DS&E clusters.
The reason for this requirement is to determine what code and notebook triggered immediately preceding resize and expanded disk size events on a specific DS&E cluster.
Thanks.
04-04-2023 11:02 AM
From the UI https://docs.databricks.com/notebooks/notebooks-code.html#version-control best way to check is version control.
BTW, do you see this helps https://www.databricks.com/blog/2022/11/02/monitoring-notebook-command-logs-static-analysis-tools.ht... @Cameron McPherson ?
04-03-2023 10:06 PM
Hi, Are you saying if you want to list it through the UI, then it is not currently available.
Please tag @Debayan with your next response which will notify me. Thank you!
04-03-2023 10:22 PM
@Debayan Mukherjee
Correct - some kind of API access would be good for this, eg the below code.
So, I would be able to construct a dataframe of all queries made against a specified cluster, or at least determine which cells / notebooks were attached to and executed on the cluster, as of specific date times.
from databricks_cli.sdk.api_client import ApiClient
from databricks_cli.<<module>>.api import <<ClusterHistoryAPI>>
from databricks_cli.clusters.api import ClusterApi
api_client = ApiClient(
host = DATABRICKS_HOST,
token = DATABRICKS_TOKEN
)
clusters_api = ClusterApi(api_client)
cluster_history_api = ClusterHistoryApi(api_client) # ie: this is API which provides history access to DS&E clusters
cluster_id = clusters_api.get_cluster_by_name('DataSciEng_Service_ClusterName').get('cluster_id')
cluster_code_exec_history = clusters_history_api.get_events(cluster_id, unix_start, unix_end,'ASC','',0,500).get('code_execution_history') # ie: history of all code segments / cells / notebooks executed on the specified DS&E cluster
df = spark.read.json(sc.parallelize(cluster_code_exec_history)) # profit
04-04-2023 11:02 AM
From the UI https://docs.databricks.com/notebooks/notebooks-code.html#version-control best way to check is version control.
BTW, do you see this helps https://www.databricks.com/blog/2022/11/02/monitoring-notebook-command-logs-static-analysis-tools.ht... @Cameron McPherson ?
04-04-2023 04:33 PM
@Atanu Sarkar Yes, your proposal will work - thank you.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group