Databricks Community

IONA · ‎08-29-2025

When you navigate to Compute > Select Cluster > Spark UI > JDBC/ODBC

There you can see grids of Session stats and SQL stats. Is there any way to get this data in a query so that I can do some analysis?

Thanks

WiliamRosa · ‎09-01-2025

Hi @IONA,

Totally agree with @Louis_Frolio — to make his point actionable, here are official docs you can use:

Query history system table (system.query.history): https://docs.databricks.com/aws/en/admin/system-tables/query-history

Query History (overview/UI): https://docs.databricks.com/aws/en/sql/user/queries/query-history

Query History REST API: https://docs.databricks.com/api/workspace/queryhistory/list

Spark Thrift Server (why JDBC/ODBC UI grids aren’t exposed as tables; retention settings): https://spark.apache.org/docs/latest/configuration.html#spark-sql

Wiliam Rosa
Data Engineer | Machine Learning Engineer
LinkedIn: linkedin.com/in/wiliamrosa

View solution in original post

szymon_dybczak · ‎09-02-2025

Hi @IONA ,

As @Louis_Frolio correctly suggested there no native way to get stats from JDBC/ODBC Spark UI.

1. You can try to use query history system table, but it has limited number of metrics

%sql
SELECT *
FROM system.query.history

2. You can use /api/2.0/sql/history/queries endpoint with include_metrics flag enabled which should return to you following payload:

3. Metrics can be also obtained for following:

Cluster metrics - you can export these with cluster logging. It's worth noting that ganglia is deprecated for newer runtimes
Warehouse metrics - available through the API for query metrics
Jobs performance - you can use the Jobs API

4. And lastly, you can apache spark REST API monitoring endpoints which gives you access to multiple different metrics. Here, just for sake of an example I'm using it to get environment configuration of my cluster, but there are many many more metrics. Full list you can find at below location:

Monitoring and Instrumentation - Spark 4.0.0 Documentation

from dbruntime.databricks_repl_context import get_context
import requests

context = get_context()
host = context.browserHostName
cluster_id = context.clusterId


spark_ui_base_url = f"https://{host}/driver-proxy-api/o/0/{cluster_id}/40001/api/v1/"
endpoint = 'applications/local-1756797804565/environment'

response = requests.get(
    spark_ui_base_url + endpoint,
    headers={"Authorization": f"Bearer {context.apiToken}"}
)

if response.status_code == 200:
    try:
        data = response.json()
        print(data)
    except requests.exceptions.JSONDecodeError:
        print("Response is not valid JSON:")
        print(response.text)
else:
    print(f"Request failed with status code: {response.status_code}")
    print(f"Response: {response.text}")

View solution in original post

Louis_Frolio · ‎09-01-2025

Hello Iona, You cannot natively query the exact Session stats and SQL stats from the JDBC/ODBC Spark UI via a simple SQL statement in Databricks today. However, advanced users and admins can access some of the underlying data via log tables (like prod.thrift_statements), Query History API, or specialized REST endpoints. For practical analysis, using the Query History API and parsing the results into Python or SQL for your analysis is the closest workaround currently available.

Hope this helps, Louis.

IONA · ‎09-02-2025

Great info. Thank you every so much.

My actual need is to find out in a programmatic manner which tables in databricks are being used by a power bi dashboard. If you open the power bi itself you can see the data model and list the tables. I would have though the one of the pbi rest api endpoints would have given this info since you can do thigs though it such as set off a refresh. But it seems that is not the case. So another approach would be to start at the datbricks end and examine what requests are made of it. Looking at the spark info can see the queries hitting the database and by looking at the user/service principle I can see what is making those requests. So by parsing the sql statement which for a refresh will be "Select * from <<SometableInPowerBI>>", I will be able to say aha, that's a table of interest to me. My aim is to then monitor the these tables that they are being refreshed so that I know all the data in our dashboard is up to date.

WiliamRosa · ‎09-01-2025

Hi @IONA,

Totally agree with @Louis_Frolio — to make his point actionable, here are official docs you can use:

Query history system table (system.query.history): https://docs.databricks.com/aws/en/admin/system-tables/query-history

Query History (overview/UI): https://docs.databricks.com/aws/en/sql/user/queries/query-history

Query History REST API: https://docs.databricks.com/api/workspace/queryhistory/list

Spark Thrift Server (why JDBC/ODBC UI grids aren’t exposed as tables; retention settings): https://spark.apache.org/docs/latest/configuration.html#spark-sql

Wiliam Rosa
Data Engineer | Machine Learning Engineer
LinkedIn: linkedin.com/in/wiliamrosa

IONA · ‎09-03-2025

This is great thanks. I will share this knowledge with my team as well.

szymon_dybczak · ‎09-02-2025

Hi @IONA ,

As @Louis_Frolio correctly suggested there no native way to get stats from JDBC/ODBC Spark UI.

1. You can try to use query history system table, but it has limited number of metrics

%sql
SELECT *
FROM system.query.history

2. You can use /api/2.0/sql/history/queries endpoint with include_metrics flag enabled which should return to you following payload:

3. Metrics can be also obtained for following:

Cluster metrics - you can export these with cluster logging. It's worth noting that ganglia is deprecated for newer runtimes
Warehouse metrics - available through the API for query metrics
Jobs performance - you can use the Jobs API

4. And lastly, you can apache spark REST API monitoring endpoints which gives you access to multiple different metrics. Here, just for sake of an example I'm using it to get environment configuration of my cluster, but there are many many more metrics. Full list you can find at below location:

Monitoring and Instrumentation - Spark 4.0.0 Documentation

from dbruntime.databricks_repl_context import get_context
import requests

context = get_context()
host = context.browserHostName
cluster_id = context.clusterId


spark_ui_base_url = f"https://{host}/driver-proxy-api/o/0/{cluster_id}/40001/api/v1/"
endpoint = 'applications/local-1756797804565/environment'

response = requests.get(
    spark_ui_base_url + endpoint,
    headers={"Authorization": f"Bearer {context.apiToken}"}
)

if response.status_code == 200:
    try:
        data = response.json()
        print(data)
    except requests.exceptions.JSONDecodeError:
        print("Response is not valid JSON:")
        print(response.text)
else:
    print(f"Request failed with status code: {response.status_code}")
    print(f"Response: {response.text}")

IONA · ‎09-02-2025

That is great. Thanks

Databricks Community

Getting data from the Spark query profiler

Join Us as a Local Community Builder!

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples

Level Up with Databricks Specialist Sessions

🌟 Community Pulse: Your Weekly Roundup! November 07 – 13, 2025

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐