cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

how to use rest api to find long running query in databricks

slakshmanan
New Contributor III

how to use rest api to find long running query in databricks from sql/queries/all

9 REPLIES 9

Ajay-Pandey
Esteemed Contributor III


Hi @slakshmanan 

To find long-running queries in Databricks using the REST API, specifically from the /sql/queries/all endpoint, you'll need to follow these steps:

Prerequisites:


Databricks Account with REST API enabled.
Access Token: You need a valid access token for authentication. You can generate it via Databricks UI under the "User Settings" page.
Workspace URL: Your Databricks workspace URL (e.g., https://<databricks-instance>.cloud.databricks.com).
Steps:
1. API Authentication
You can authenticate using your Bearer Token in the headers of the API request.


Authorization: Bearer <Access Token>
2. REST API Endpoint
The API endpoint for listing all SQL queries is:

GET /api/2.0/sql/history/queries
3. Query Parameters
page: Page number of the results.
page_size: The number of queries per page.

This will list the SQL query history from the most recent to the oldest.

4. Filter for Long-Running Queries
You'll need to programmatically analyze the query results to identify long-running ones. In the response, you'll get details like:

query_id: Unique identifier for the query.
start_time_ms: Query start time in epoch milliseconds.
end_time_ms: Query end time in epoch milliseconds.
duration_ms: Duration of the query in milliseconds.
query_text: The SQL query text.
You can use the duration_ms to filter out queries that ran for too long.

Example Response Snippet:


json

{
"res": [
{
"query_id": "abc123",
"query_text": "SELECT * FROM example_table",
"start_time_ms": 1696456800000,
"end_time_ms": 1696460400000,
"duration_ms": 3600000
},
...
]
}
You can calculate the duration of each query (duration_ms) and flag queries that exceed a certain threshold (e.g., 10 minutes = 600,000 ms).

5. Code Example (Python + requests):


python
 

import requests

# Replace with your Databricks instance and access token
databricks_instance = "<databricks-instance>"
access_token = "<your-access-token>"

headers = {
"Authorization": f"Bearer {access_token}"
}

# API URL to fetch SQL query history
url = f"https://{databricks_instance}.cloud.databricks.com/api/2.0/sql/history/queries"

# API request with optional filters (pagination can be added)
response = requests.get(url, headers=headers)

if response.status_code == 200:
queries = response.json().get("res", [])

# Filter for long-running queries (e.g., >10 minutes)
long_running_queries = [q for q in queries if q.get('duration_ms', 0) > 600000]

for query in long_running_queries:
print(f"Query ID: {query['query_id']}, Duration: {query['duration_ms']} ms")
else:
print(f"Error: {response.status_code} - {response.text}")

This script will fetch the query history and filter out the queries that have run for more than 10 minutes (600,000 milliseconds).

Ajay Kumar Pandey

I would like to see if there are any queries running with run time more than 30 minutes or queries pulling more than 1 million 

slakshmanan
New Contributor III

Thanks, i will try this and let you know

 

slakshmanan
New Contributor III

@Ajay-Pandey  to find out if my databricks is using rest api, how do i check it

To check for API call logs in Databricks, you can follow these steps:

  • Log in to your Databricks workspace.
  • If you have the necessary permissions, click on your username in the top right corner and select Admin Console.
  • In the Admin Console, look for the Audit Logs section. This section provides detailed logs of actions taken in the workspace, including API calls.
Rishabh Pandey

slakshmanan
New Contributor III

how to cancel or kill a long running sql query from databricks python notebook.I have a long running sql query in sql warehouse

Rishabh-Pandey
Esteemed Contributor

You can run this query to get the long running queries and then kill the query you wanted to kill.

# Step 1: Get active queries
active_queries = spark.sql("SHOW PROCESSLIST")
active_queries.show(truncate=False)

# Step 2: Identify the query ID you want to kill
# (Assume you noted the ID as '12345')

# Step 3: Kill the query
spark.sql("KILL QUERY 12345")

@slakshmanan   

Rishabh Pandey

Srini_ADB
New Contributor II

@Ajay-Pandey - Hi .. I want to check the queries that are actively running and taking long time for execution.. please suggest how to check? Thanks.

Srini_ADB
New Contributor II

@Ajay-Pandey Thanks. This API works fine. But it is showing only the current day queries. How can we get the all queries which is currently running.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group