โ10-07-2024 07:13 PM
how to use rest api to find long running query in databricks from sql/queries/all
โ10-07-2024 09:10 PM
Hi @slakshmanan
To find long-running queries in Databricks using the REST API, specifically from the /sql/queries/all endpoint, you'll need to follow these steps:
Prerequisites:
Databricks Account with REST API enabled.
Access Token: You need a valid access token for authentication. You can generate it via Databricks UI under the "User Settings" page.
Workspace URL: Your Databricks workspace URL (e.g., https://<databricks-instance>.cloud.databricks.com).
Steps:
1. API Authentication
You can authenticate using your Bearer Token in the headers of the API request.
Authorization: Bearer <Access Token>
2. REST API Endpoint
The API endpoint for listing all SQL queries is:
GET /api/2.0/sql/history/queries
3. Query Parameters
page: Page number of the results.
page_size: The number of queries per page.
This will list the SQL query history from the most recent to the oldest.
4. Filter for Long-Running Queries
You'll need to programmatically analyze the query results to identify long-running ones. In the response, you'll get details like:
query_id: Unique identifier for the query.
start_time_ms: Query start time in epoch milliseconds.
end_time_ms: Query end time in epoch milliseconds.
duration_ms: Duration of the query in milliseconds.
query_text: The SQL query text.
You can use the duration_ms to filter out queries that ran for too long.
Example Response Snippet:
json
{
"res": [
{
"query_id": "abc123",
"query_text": "SELECT * FROM example_table",
"start_time_ms": 1696456800000,
"end_time_ms": 1696460400000,
"duration_ms": 3600000
},
...
]
}
You can calculate the duration of each query (duration_ms) and flag queries that exceed a certain threshold (e.g., 10 minutes = 600,000 ms).
5. Code Example (Python + requests):
python
import requests
# Replace with your Databricks instance and access token
databricks_instance = "<databricks-instance>"
access_token = "<your-access-token>"
headers = {
"Authorization": f"Bearer {access_token}"
}
# API URL to fetch SQL query history
url = f"https://{databricks_instance}.cloud.databricks.com/api/2.0/sql/history/queries"
# API request with optional filters (pagination can be added)
response = requests.get(url, headers=headers)
if response.status_code == 200:
queries = response.json().get("res", [])
# Filter for long-running queries (e.g., >10 minutes)
long_running_queries = [q for q in queries if q.get('duration_ms', 0) > 600000]
for query in long_running_queries:
print(f"Query ID: {query['query_id']}, Duration: {query['duration_ms']} ms")
else:
print(f"Error: {response.status_code} - {response.text}")This script will fetch the query history and filter out the queries that have run for more than 10 minutes (600,000 milliseconds).
โ10-08-2024 10:13 AM
I would like to see if there are any queries running with run time more than 30 minutes or queries pulling more than 1 million
โ10-08-2024 07:38 AM
Thanks, i will try this and let you know
โ10-08-2024 07:45 AM
@Ajay-Pandey to find out if my databricks is using rest api, how do i check it
โ10-08-2024 10:39 AM
To check for API call logs in Databricks, you can follow these steps:
โ10-08-2024 02:40 PM
how to cancel or kill a long running sql query from databricks python notebook.I have a long running sql query in sql warehouse
โ10-08-2024 11:43 PM
You can run this query to get the long running queries and then kill the query you wanted to kill.
# Step 1: Get active queries
active_queries = spark.sql("SHOW PROCESSLIST")
active_queries.show(truncate=False)
# Step 2: Identify the query ID you want to kill
# (Assume you noted the ID as '12345')
# Step 3: Kill the query
spark.sql("KILL QUERY 12345")
โ11-11-2024 11:47 PM
@Ajay-Pandey - Hi .. I want to check the queries that are actively running and taking long time for execution.. please suggest how to check? Thanks.
โ11-13-2024 11:45 PM
@Ajay-Pandey Thanks. This API works fine. But it is showing only the current day queries. How can we get the all queries which is currently running.