topic Re: Getting data from the Spark query profiler in Data Engineering

Getting data from the Spark query profiler

IONA — Fri, 29 Aug 2025 09:33:36 GMT

When you navigate to Compute > Select Cluster > Spark UI > JDBC/ODBC

There you can see grids of Session stats and SQL stats. Is there any way to get this data in a query so that I can do some analysis?

Thanks

Re: Getting data from the Spark query profiler

Louis_Frolio — Mon, 01 Sep 2025 19:40:47 GMT

Hello Iona, You cannot natively query the exact Session stats and SQL stats from the JDBC/ODBC Spark UI via a simple SQL statement in Databricks today. However, advanced users and admins can access some of the underlying data via log tables (like prod.thrift_statements), Query History API, or specialized REST endpoints. For practical analysis, using the Query History API and parsing the results into Python or SQL for your analysis is the closest workaround currently available.

Hope this helps, Louis.

Re: Getting data from the Spark query profiler

WiliamRosa — Mon, 01 Sep 2025 22:59:00 GMT

Hi @IONA,

Totally agree with @Louis_Frolio — to make his point actionable, here are official docs you can use:

Query history system table (system.query.history): https://docs.databricks.com/aws/en/admin/system-tables/query-history

Query History (overview/UI): https://docs.databricks.com/aws/en/sql/user/queries/query-history

Query History REST API: https://docs.databricks.com/api/workspace/queryhistory/list

Spark Thrift Server (why JDBC/ODBC UI grids aren’t exposed as tables; retention settings): https://spark.apache.org/docs/latest/configuration.html#spark-sql

Re: Getting data from the Spark query profiler

szymon_dybczak — Tue, 02 Sep 2025 08:01:17 GMT

Hi @IONA ,

As @Louis_Frolio correctly suggested there no native way to get stats from JDBC/ODBC Spark UI.

1. You can try to use query history system table, but it has limited number of metrics

%sql SELECT * FROM system.query.history

2. You can use /api/2.0/sql/history/queries endpoint with include_metrics flag enabled which should return to you following payload:

3. Metrics can be also obtained for following:

Cluster metrics - you can export these with cluster logging. It's worth noting that ganglia is deprecated for newer runtimes
Warehouse metrics - available through the API for query metrics
Jobs performance - you can use the Jobs API

4. And lastly, you can apache spark REST API monitoring endpoints which gives you access to multiple different metrics. Here, just for sake of an example I'm using it to get environment configuration of my cluster, but there are many many more metrics. Full list you can find at below location:

Monitoring and Instrumentation - Spark 4.0.0 Documentation

from dbruntime.databricks_repl_context import get_context import requests context = get_context() host = context.browserHostName cluster_id = context.clusterId spark_ui_base_url = f"https://{host}/driver-proxy-api/o/0/{cluster_id}/40001/api/v1/" endpoint = 'applications/local-1756797804565/environment' response = requests.get( spark_ui_base_url + endpoint, headers={"Authorization": f"Bearer {context.apiToken}"} ) if response.status_code == 200: try: data = response.json() print(data) except requests.exceptions.JSONDecodeError: print("Response is not valid JSON:") print(response.text) else: print(f"Request failed with status code: {response.status_code}") print(f"Response: {response.text}")

Re: Getting data from the Spark query profiler

IONA — Tue, 02 Sep 2025 09:22:33 GMT

Great info. Thank you every so much.

My actual need is to find out in a programmatic manner which tables in databricks are being used by a power bi dashboard. If you open the power bi itself you can see the data model and list the tables. I would have though the one of the pbi rest api endpoints would have given this info since you can do thigs though it such as set off a refresh. But it seems that is not the case. So another approach would be to start at the datbricks end and examine what requests are made of it. Looking at the spark info can see the queries hitting the database and by looking at the user/service principle I can see what is making those requests. So by parsing the sql statement which for a refresh will be "Select * from <<SometableInPowerBI>>", I will be able to say aha, that's a table of interest to me. My aim is to then monitor the these tables that they are being refreshed so that I know all the data in our dashboard is up to date.

Re: Getting data from the Spark query profiler

IONA — Tue, 02 Sep 2025 09:23:28 GMT

That is great. Thanks

Re: Getting data from the Spark query profiler

IONA — Wed, 03 Sep 2025 11:15:01 GMT

This is great thanks. I will share this knowledge with my team as well.