Friday
When you navigate to Compute > Select Cluster > Spark UI > JDBC/ODBC
There you can see grids of Session stats and SQL stats. Is there any way to get this data in a query so that I can do some analysis?
Thanks
yesterday
Hello Iona, You cannot natively query the exact Session stats and SQL stats from the JDBC/ODBC Spark UI via a simple SQL statement in Databricks today. However, advanced users and admins can access some of the underlying data via log tables (like prod.thrift_statements), Query History API, or specialized REST endpoints. For practical analysis, using the Query History API and parsing the results into Python or SQL for your analysis is the closest workaround currently available.
Hope this helps, Louis.
17 hours ago
Great info. Thank you every so much.
My actual need is to find out in a programmatic manner which tables in databricks are being used by a power bi dashboard. If you open the power bi itself you can see the data model and list the tables. I would have though the one of the pbi rest api endpoints would have given this info since you can do thigs though it such as set off a refresh. But it seems that is not the case. So another approach would be to start at the datbricks end and examine what requests are made of it. Looking at the spark info can see the queries hitting the database and by looking at the user/service principle I can see what is making those requests. So by parsing the sql statement which for a refresh will be "Select * from <<SometableInPowerBI>>", I will be able to say aha, that's a table of interest to me. My aim is to then monitor the these tables that they are being refreshed so that I know all the data in our dashboard is up to date.
yesterday
Hi @IONA,
Totally agree with @BigRoux โ to make his point actionable, here are official docs you can use:
Query history system table (system.query.history): https://docs.databricks.com/aws/en/admin/system-tables/query-history
Query History (overview/UI): https://docs.databricks.com/aws/en/sql/user/queries/query-history
Query History REST API: https://docs.databricks.com/api/workspace/queryhistory/list
Spark Thrift Server (why JDBC/ODBC UI grids arenโt exposed as tables; retention settings): https://spark.apache.org/docs/latest/configuration.html#spark-sql
19 hours ago - last edited 18 hours ago
Hi @IONA ,
As @BigRoux correctly suggested there no native way to get stats from JDBC/ODBC Spark UI.
1. You can try to use query history system table, but it has limited number of metrics
%sql
SELECT *
FROM system.query.history
2. You can use /api/2.0/sql/history/queries endpoint with include_metrics flag enabled which should return to you following payload:
3. Metrics can be also obtained for following:
4. And lastly, you can apache spark REST API monitoring endpoints which gives you access to multiple different metrics. Here, just for sake of an example I'm using it to get environment configuration of my cluster, but there are many many more metrics. Full list you can find at below location:
Monitoring and Instrumentation - Spark 4.0.0 Documentation
from dbruntime.databricks_repl_context import get_context
import requests
context = get_context()
host = context.browserHostName
cluster_id = context.clusterId
spark_ui_base_url = f"https://{host}/driver-proxy-api/o/0/{cluster_id}/40001/api/v1/"
endpoint = 'applications/local-1756797804565/environment'
response = requests.get(
spark_ui_base_url + endpoint,
headers={"Authorization": f"Bearer {context.apiToken}"}
)
if response.status_code == 200:
try:
data = response.json()
print(data)
except requests.exceptions.JSONDecodeError:
print("Response is not valid JSON:")
print(response.text)
else:
print(f"Request failed with status code: {response.status_code}")
print(f"Response: {response.text}")
17 hours ago
That is great. Thanks
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now