<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Getting data from the Spark query profiler in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/getting-data-from-the-spark-query-profiler/m-p/130455#M48800</link>
    <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/114193"&gt;@IONA&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;As&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/34815"&gt;@Louis_Frolio&lt;/a&gt;&amp;nbsp; correctly suggested there no native way to get stats from JDBC/ODBC Spark UI.&lt;BR /&gt;&lt;BR /&gt;1. You can try to use query history system table, but it has limited number of metrics&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;%sql
SELECT *
FROM system.query.history&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;2. You can use&amp;nbsp;&lt;STRONG&gt;/api/2.0/sql/history/queries&amp;nbsp;&lt;/STRONG&gt;endpoint with &lt;STRONG&gt;include_metrics&amp;nbsp;&lt;/STRONG&gt;flag enabled which should return to you following payload:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="szymon_dybczak_0-1756799269340.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/19559i378B8FBFA37418B8/image-size/medium?v=v2&amp;amp;px=400" role="button" title="szymon_dybczak_0-1756799269340.png" alt="szymon_dybczak_0-1756799269340.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;3. Metrics can be also obtained for following:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Cluster metrics - you can export these with cluster logging. It's worth noting that ganglia is deprecated for newer runtimes&lt;/LI&gt;&lt;LI&gt;Warehouse metrics - available through the API for query metrics&lt;/LI&gt;&lt;LI&gt;Jobs performance - you can use the Jobs API&amp;nbsp;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;4. And lastly, you can apache spark REST API monitoring endpoints which gives you access to multiple different metrics. Here, just for sake of an example I'm using it to get environment configuration of my cluster, but there are many many more metrics. Full list you can find at below location:&lt;/P&gt;&lt;P&gt;&lt;A href="https://spark.apache.org/docs/latest/monitoring.html#rest-api" target="_blank" rel="noopener"&gt;Monitoring and Instrumentation - Spark 4.0.0 Documentation&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;from dbruntime.databricks_repl_context import get_context
import requests

context = get_context()
host = context.browserHostName
cluster_id = context.clusterId


spark_ui_base_url = f"https://{host}/driver-proxy-api/o/0/{cluster_id}/40001/api/v1/"
endpoint = 'applications/local-1756797804565/environment'

response = requests.get(
    spark_ui_base_url + endpoint,
    headers={"Authorization": f"Bearer {context.apiToken}"}
)

if response.status_code == 200:
    try:
        data = response.json()
        print(data)
    except requests.exceptions.JSONDecodeError:
        print("Response is not valid JSON:")
        print(response.text)
else:
    print(f"Request failed with status code: {response.status_code}")
    print(f"Response: {response.text}")&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 02 Sep 2025 08:01:17 GMT</pubDate>
    <dc:creator>szymon_dybczak</dc:creator>
    <dc:date>2025-09-02T08:01:17Z</dc:date>
    <item>
      <title>Getting data from the Spark query profiler</title>
      <link>https://community.databricks.com/t5/data-engineering/getting-data-from-the-spark-query-profiler/m-p/130137#M48714</link>
      <description>&lt;P&gt;When you navigate to Compute &amp;gt; Select Cluster &amp;gt; Spark UI &amp;gt; JDBC/ODBC&amp;nbsp;&lt;/P&gt;&lt;P&gt;There you can see grids of Session stats and SQL stats. Is there any way to get this data in a query so that I can do some analysis?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Fri, 29 Aug 2025 09:33:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/getting-data-from-the-spark-query-profiler/m-p/130137#M48714</guid>
      <dc:creator>IONA</dc:creator>
      <dc:date>2025-08-29T09:33:36Z</dc:date>
    </item>
    <item>
      <title>Re: Getting data from the Spark query profiler</title>
      <link>https://community.databricks.com/t5/data-engineering/getting-data-from-the-spark-query-profiler/m-p/130420#M48787</link>
      <description>&lt;P&gt;Hello Iona,&amp;nbsp;You &lt;STRONG&gt;cannot natively query the exact Session stats and SQL stats&lt;/STRONG&gt; from the JDBC/ODBC Spark UI via a simple SQL statement in Databricks today. However, advanced users and admins can access some of the underlying data via log tables (like prod.thrift_statements), Query History API, or specialized REST endpoints. For practical analysis, using the Query History API and parsing the results into Python or SQL for your analysis is the closest workaround currently available.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Hope this helps, Louis.&lt;/P&gt;</description>
      <pubDate>Mon, 01 Sep 2025 19:40:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/getting-data-from-the-spark-query-profiler/m-p/130420#M48787</guid>
      <dc:creator>Louis_Frolio</dc:creator>
      <dc:date>2025-09-01T19:40:47Z</dc:date>
    </item>
    <item>
      <title>Re: Getting data from the Spark query profiler</title>
      <link>https://community.databricks.com/t5/data-engineering/getting-data-from-the-spark-query-profiler/m-p/130430#M48794</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/114193"&gt;@IONA&lt;/a&gt;,&lt;/P&gt;&lt;P&gt;Totally agree with &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/34815"&gt;@Louis_Frolio&lt;/a&gt;&amp;nbsp; — to make his point actionable, here are official docs you can use:&lt;/P&gt;&lt;P&gt;Query history system table (system.query.history): &lt;A href="https://docs.databricks.com/aws/en/admin/system-tables/query-history" target="_blank"&gt;https://docs.databricks.com/aws/en/admin/system-tables/query-history&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Query History (overview/UI): &lt;A href="https://docs.databricks.com/aws/en/sql/user/queries/query-history" target="_blank"&gt;https://docs.databricks.com/aws/en/sql/user/queries/query-history&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Query History REST API: &lt;A href="https://docs.databricks.com/api/workspace/queryhistory/list" target="_blank"&gt;https://docs.databricks.com/api/workspace/queryhistory/list&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Spark Thrift Server (why JDBC/ODBC UI grids aren’t exposed as tables; retention settings): &lt;A href="https://spark.apache.org/docs/latest/configuration.html#spark-sql" target="_blank"&gt;https://spark.apache.org/docs/latest/configuration.html#spark-sql&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 01 Sep 2025 22:59:00 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/getting-data-from-the-spark-query-profiler/m-p/130430#M48794</guid>
      <dc:creator>WiliamRosa</dc:creator>
      <dc:date>2025-09-01T22:59:00Z</dc:date>
    </item>
    <item>
      <title>Re: Getting data from the Spark query profiler</title>
      <link>https://community.databricks.com/t5/data-engineering/getting-data-from-the-spark-query-profiler/m-p/130455#M48800</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/114193"&gt;@IONA&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;As&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/34815"&gt;@Louis_Frolio&lt;/a&gt;&amp;nbsp; correctly suggested there no native way to get stats from JDBC/ODBC Spark UI.&lt;BR /&gt;&lt;BR /&gt;1. You can try to use query history system table, but it has limited number of metrics&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;%sql
SELECT *
FROM system.query.history&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;2. You can use&amp;nbsp;&lt;STRONG&gt;/api/2.0/sql/history/queries&amp;nbsp;&lt;/STRONG&gt;endpoint with &lt;STRONG&gt;include_metrics&amp;nbsp;&lt;/STRONG&gt;flag enabled which should return to you following payload:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="szymon_dybczak_0-1756799269340.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/19559i378B8FBFA37418B8/image-size/medium?v=v2&amp;amp;px=400" role="button" title="szymon_dybczak_0-1756799269340.png" alt="szymon_dybczak_0-1756799269340.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;3. Metrics can be also obtained for following:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Cluster metrics - you can export these with cluster logging. It's worth noting that ganglia is deprecated for newer runtimes&lt;/LI&gt;&lt;LI&gt;Warehouse metrics - available through the API for query metrics&lt;/LI&gt;&lt;LI&gt;Jobs performance - you can use the Jobs API&amp;nbsp;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;4. And lastly, you can apache spark REST API monitoring endpoints which gives you access to multiple different metrics. Here, just for sake of an example I'm using it to get environment configuration of my cluster, but there are many many more metrics. Full list you can find at below location:&lt;/P&gt;&lt;P&gt;&lt;A href="https://spark.apache.org/docs/latest/monitoring.html#rest-api" target="_blank" rel="noopener"&gt;Monitoring and Instrumentation - Spark 4.0.0 Documentation&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;from dbruntime.databricks_repl_context import get_context
import requests

context = get_context()
host = context.browserHostName
cluster_id = context.clusterId


spark_ui_base_url = f"https://{host}/driver-proxy-api/o/0/{cluster_id}/40001/api/v1/"
endpoint = 'applications/local-1756797804565/environment'

response = requests.get(
    spark_ui_base_url + endpoint,
    headers={"Authorization": f"Bearer {context.apiToken}"}
)

if response.status_code == 200:
    try:
        data = response.json()
        print(data)
    except requests.exceptions.JSONDecodeError:
        print("Response is not valid JSON:")
        print(response.text)
else:
    print(f"Request failed with status code: {response.status_code}")
    print(f"Response: {response.text}")&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 02 Sep 2025 08:01:17 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/getting-data-from-the-spark-query-profiler/m-p/130455#M48800</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2025-09-02T08:01:17Z</dc:date>
    </item>
    <item>
      <title>Re: Getting data from the Spark query profiler</title>
      <link>https://community.databricks.com/t5/data-engineering/getting-data-from-the-spark-query-profiler/m-p/130475#M48804</link>
      <description>&lt;P&gt;Great info. Thank you every so much.&lt;/P&gt;&lt;P&gt;My actual need is to find out in a programmatic manner which tables in databricks are being used by a power bi dashboard. If you open the power bi itself you can see the data model and list the tables. I would have though the one of the pbi rest api endpoints would have given this info since you can do thigs though it such as set off a refresh. But it seems that is not the case. So another approach would be to start at the datbricks end and examine what requests are made of it. Looking at the spark info can see the queries hitting the database and by looking at the user/service principle I can see what is making those requests. So by parsing the sql statement which for a refresh will be "Select * from &amp;lt;&amp;lt;SometableInPowerBI&amp;gt;&amp;gt;", I will be able to say aha, that's a table of interest to me. My aim is to then monitor the these tables that they are being refreshed so that I know all the data in our dashboard is up to date.&lt;/P&gt;</description>
      <pubDate>Tue, 02 Sep 2025 09:22:33 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/getting-data-from-the-spark-query-profiler/m-p/130475#M48804</guid>
      <dc:creator>IONA</dc:creator>
      <dc:date>2025-09-02T09:22:33Z</dc:date>
    </item>
    <item>
      <title>Re: Getting data from the Spark query profiler</title>
      <link>https://community.databricks.com/t5/data-engineering/getting-data-from-the-spark-query-profiler/m-p/130477#M48805</link>
      <description>&lt;P&gt;That is great. Thanks&lt;/P&gt;</description>
      <pubDate>Tue, 02 Sep 2025 09:23:28 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/getting-data-from-the-spark-query-profiler/m-p/130477#M48805</guid>
      <dc:creator>IONA</dc:creator>
      <dc:date>2025-09-02T09:23:28Z</dc:date>
    </item>
    <item>
      <title>Re: Getting data from the Spark query profiler</title>
      <link>https://community.databricks.com/t5/data-engineering/getting-data-from-the-spark-query-profiler/m-p/130640#M48858</link>
      <description>&lt;P&gt;This is great thanks. I will share this knowledge with my team as well.&lt;/P&gt;</description>
      <pubDate>Wed, 03 Sep 2025 11:15:01 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/getting-data-from-the-spark-query-profiler/m-p/130640#M48858</guid>
      <dc:creator>IONA</dc:creator>
      <dc:date>2025-09-03T11:15:01Z</dc:date>
    </item>
  </channel>
</rss>

