@Milind Keerโ :
To extract the execution time of each cell in a notebook using the Databricks REST API, you can use the get method of the api/2.0/workspace endpoint.
First, you need to get the notebook's ID using the api/2.0/workspace/get-status endpoint. Once you have the ID, you can use the get method of the api/2.0/workspace endpoint to get the notebook content. The response includes the cell's start time and end time. You can then calculate the execution time of each cell.
Here's an example Python script that uses the Databricks REST API to extract the execution time of each cell in a notebook:
import requests
import json
# Databricks workspace URL
url = 'https://<databricks-instance>.azuredatabricks.net'
# Databricks workspace API token
token = '<databricks-token>'
# Notebook path
notebook_path = '/path/to/notebook'
# Get notebook ID
get_status_url = f'{url}/api/2.0/workspace/get-status'
headers = {'Authorization': f'Bearer {token}'}
params = {'path': notebook_path}
response = requests.get(get_status_url, headers=headers, params=params)
notebook_id = response.json()['object_id']
# Get notebook content
get_url = f'{url}/api/2.0/workspace/get'
params = {'path': notebook_path}
data = {'format': 'SOURCE'}
response = requests.get(get_url, headers=headers, params=params, data=json.dumps(data))
notebook_content = response.json()['content']
# Extract cell execution time
cells = notebook_content.split('# COMMAND ----------\n')
for cell in cells[1:]:
cell = cell.strip()
if not cell:
continue
cell_content = cell.split('\n')
cell_start_time = cell_content[0].split(' - ')[0]
cell_end_time = cell_content[-1].split(' - ')[0]
execution_time = (pd.to_datetime(cell_end_time) - pd.to_datetime(cell_start_time)).total_seconds()
print(f'Cell execution time: {execution_time} seconds')
You will need to replace <databricks-instance>, <databricks-token>, and /path/to/notebook with your Databricks instance URL, API token, and the path to your notebook, respectively.