Databricks

Mikki007 · ‎04-27-2023

Hi

I have a notebook with many command line cells in it.

I want to extract the execution time of each cell using Databricks REST API? How can I do that?

Please note - I managed to get the Start & End time of the Job using REST API (/2.1/jobs/runs/get) for the notebook however struggling to get it at the Cell level

I am on Azure Databricks Runtime version 13.

Any help on this would be highly appreciated.

Anonymous · ‎04-28-2023

@Milind Keer :

To extract the execution time of each cell in a notebook using the Databricks REST API, you can use the get method of the api/2.0/workspace endpoint.

First, you need to get the notebook's ID using the api/2.0/workspace/get-status endpoint. Once you have the ID, you can use the get method of the api/2.0/workspace endpoint to get the notebook content. The response includes the cell's start time and end time. You can then calculate the execution time of each cell.

Here's an example Python script that uses the Databricks REST API to extract the execution time of each cell in a notebook:

import requests
import json
 
# Databricks workspace URL
url = 'https://<databricks-instance>.azuredatabricks.net'
 
# Databricks workspace API token
token = '<databricks-token>'
 
# Notebook path
notebook_path = '/path/to/notebook'
 
# Get notebook ID
get_status_url = f'{url}/api/2.0/workspace/get-status'
headers = {'Authorization': f'Bearer {token}'}
params = {'path': notebook_path}
response = requests.get(get_status_url, headers=headers, params=params)
notebook_id = response.json()['object_id']
 
# Get notebook content
get_url = f'{url}/api/2.0/workspace/get'
params = {'path': notebook_path}
data = {'format': 'SOURCE'}
response = requests.get(get_url, headers=headers, params=params, data=json.dumps(data))
notebook_content = response.json()['content']
 
# Extract cell execution time
cells = notebook_content.split('# COMMAND ----------\n')
for cell in cells[1:]:
    cell = cell.strip()
    if not cell:
        continue
    cell_content = cell.split('\n')
    cell_start_time = cell_content[0].split(' - ')[0]
    cell_end_time = cell_content[-1].split(' - ')[0]
    execution_time = (pd.to_datetime(cell_end_time) - pd.to_datetime(cell_start_time)).total_seconds()
    print(f'Cell execution time: {execution_time} seconds')

You will need to replace <databricks-instance>, <databricks-token>, and /path/to/notebook with your Databricks instance URL, API token, and the path to your notebook, respectively.

Mikki007 · ‎05-02-2023

@Suteja Kanuri

thanks for your reply however I am getting below error -

b'{"error_code":"ENDPOINT_NOT_FOUND","message":"No API found for \'GET /workspace/get\'"}'

the GET endpoint has been deprecated as per below doc,

Workspace API 2.0 | Databricks on AWS

I even tried 'Export' but it didn't return anything (blank)

Anonymous · ‎04-28-2023

Hi @Milind Keer

Thank you for posting your question in our community! We are happy to assist you.

To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?

This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance!

Databricks

How to extract the start and end time of the command line cell of the notebook using REST API in Azure Databricks?

Registration now open! Databricks Data + AI Summit 2024

Meet DBRX, the New Standard for High-Quality LLMs

Data Warehousing in the Era of AI