@Debayan Mukherjee
Correct - some kind of API access would be good for this, eg the below code.
So, I would be able to construct a dataframe of all queries made against a specified cluster, or at least determine which cells / notebooks were attached to and executed on the cluster, as of specific date times.
from databricks_cli.sdk.api_client import ApiClient
from databricks_cli.<<module>>.api import <<ClusterHistoryAPI>>
from databricks_cli.clusters.api import ClusterApi
api_client = ApiClient(
host = DATABRICKS_HOST,
token = DATABRICKS_TOKEN
)
clusters_api = ClusterApi(api_client)
cluster_history_api = ClusterHistoryApi(api_client) # ie: this is API which provides history access to DS&E clusters
cluster_id = clusters_api.get_cluster_by_name('DataSciEng_Service_ClusterName').get('cluster_id')
cluster_code_exec_history = clusters_history_api.get_events(cluster_id, unix_start, unix_end,'ASC','',0,500).get('code_execution_history') # ie: history of all code segments / cells / notebooks executed on the specified DS&E cluster
df = spark.read.json(sc.parallelize(cluster_code_exec_history)) # profit