<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Unable to access metrics from Driver node on localhost:4040 in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/unable-to-access-metrics-from-driver-node-on-localhost-4040/m-p/133238#M49754</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/110502"&gt;@szymon_dybczak&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;Thanks for the quick response. We initially tried making Pushgateway work, but this seems to be designed for tracking metrics related to ephemeral batch jobs.&lt;/P&gt;&lt;P&gt;We are trying to track metrics for streaming jobs, which the pushgateway is not able to handle because it stores all metrics in memory and quickly runs out of memory in the host machine.&lt;/P&gt;</description>
    <pubDate>Mon, 29 Sep 2025 10:31:56 GMT</pubDate>
    <dc:creator>vishal_balaji</dc:creator>
    <dc:date>2025-09-29T10:31:56Z</dc:date>
    <item>
      <title>Unable to access metrics from Driver node on localhost:4040</title>
      <link>https://community.databricks.com/t5/data-engineering/unable-to-access-metrics-from-driver-node-on-localhost-4040/m-p/133231#M49750</link>
      <description>&lt;P&gt;Greetings,&lt;/P&gt;&lt;P&gt;I am trying to setup monitoring in Grafana for all my databricks clusters&lt;/P&gt;&lt;P&gt;I have added 2 things as part of this&lt;/P&gt;&lt;P&gt;Under Compute &amp;gt; Configuration &amp;gt; Advanced &amp;gt; Spark &amp;gt; Spark Config, I have added&lt;BR /&gt;spark.ui.prometheus.enabled true&lt;/P&gt;&lt;P&gt;Under init_scripts, I have this script&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;#!/bin/bash&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;cat &amp;gt; /databricks/spark/conf/jmxCollector.yaml &amp;lt;&amp;lt;&lt;/SPAN&gt;&lt;SPAN&gt;EOF&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;lowercaseOutputName: false&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;lowercaseOutputLabelNames: false&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;whitelistObjectNames: [&lt;/SPAN&gt;&lt;SPAN&gt;"*:*"&lt;/SPAN&gt;&lt;SPAN&gt;]&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;EOF&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;cat &amp;gt;&amp;gt; /databricks/spark/conf/metrics.properties &amp;lt;&amp;lt;&lt;/SPAN&gt;&lt;SPAN&gt;EOF&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;# Enable Prometheus for all instances by class name&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;driver.sink.prometheusServlet.class=org.apache.spark.metrics.sink.PrometheusServlet&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;executor.sink.prometheusServlet.class=org.apache.spark.metrics.sink.PrometheusServlet&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;driver.sink.prometheusServlet.path=/metrics/prometheus&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;executor.sink.prometheusServlet.path=/metrics/executor/prometheus&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;master.sink.prometheusServlet.path=/metrics/master/prometheus&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;applications.sink.prometheusServlet.path=/metrics/applications/prometheus&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;*.sink.jmx.class=org.apache.spark.metrics.sink.JmxSink&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;*.source.jvm.class=org.apache.spark.metrics.source.JvmSource&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;# *.sink.console.class=org.apache.spark.metrics.sink.ConsoleSink&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;# *.sink.console.period=120&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;# driver.sink.console.unit=seconds&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;EOF&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&lt;SPAN&gt;However I am not able to access these metrics on localhost:4040 when I try to connect to the cluster. I tried doing&lt;BR /&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;curl&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;A href="http://localhost:4040/metrics/prometheus/" target="_blank" rel="noopener"&gt;http://localhost:4040/metrics/prometheus/&lt;/A&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&lt;SPAN&gt;gives&lt;BR /&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;DIV class=""&gt;&lt;DIV&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV&gt;&lt;DIV class=""&gt;&lt;DIV&gt;&lt;DIV class=""&gt;&lt;DIV&gt;&lt;DIV class=""&gt;&lt;DIV&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;curl: (7) Failed to connect to localhost port 4040 after 1 ms: Couldn't connect to server&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class=""&gt;Directly connecting to Driver IP&amp;nbsp;gives an empty response&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;curl &lt;/SPAN&gt;&lt;SPAN&gt;-v&lt;/SPAN&gt;&lt;SPAN&gt; &lt;A href="http://10.4.86.136:37479/metrics/prometheus" target="_blank" rel="noopener"&gt;http://10.4.86.136:37479/metrics/prometheus&lt;/A&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;P&gt;* Connected to 10.4.86.136 (10.4.86.136) port 37479&lt;/P&gt;&lt;P&gt;&amp;gt; GET /metrics/prometheus HTTP/1.1&lt;/P&gt;&lt;P&gt;&amp;gt; Host: 10.4.86.136:37479&lt;/P&gt;&lt;P&gt;&amp;gt; User-Agent: curl/8.5.0&lt;/P&gt;&lt;P&gt;&amp;gt; Accept: */*&lt;/P&gt;&lt;P&gt;&amp;lt; * Empty&lt;/P&gt;&lt;P&gt;reply from server 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0&lt;/P&gt;&lt;P&gt;* Closing connection&lt;/P&gt;&lt;P&gt;curl: (52) Empty reply from server&lt;/P&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;OL&gt;&lt;LI&gt;Am I configuring something wrong here? Why is the endpoint not reachable via localhost:4040 like it's mentioned in the docs -&amp;nbsp;&lt;A href="https://spark.apache.org/docs/latest/monitoring.html#metrics" target="_blank" rel="noopener"&gt;https://spark.apache.org/docs/latest/monitoring.html#metrics&lt;/A&gt;&lt;/LI&gt;&lt;LI&gt;Why am I getting an empty response from DRIVER_IP/metrics/prometheus? I got to try that from here -&amp;nbsp;&lt;A href="https://stackoverflow.com/questions/70989641/spark-executor-metrics-dont-reach-prometheus-sink" target="_blank" rel="noopener"&gt;https://stackoverflow.com/questions/70989641/spark-executor-metrics-dont-reach-prometheus-sink&lt;/A&gt;&lt;/LI&gt;&lt;LI&gt;If I have to access this only through the DRIVER_IP, how do I get access to this within the context of the init_script?&lt;/LI&gt;&lt;/OL&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Mon, 29 Sep 2025 08:55:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unable-to-access-metrics-from-driver-node-on-localhost-4040/m-p/133231#M49750</guid>
      <dc:creator>vishal_balaji</dc:creator>
      <dc:date>2025-09-29T08:55:56Z</dc:date>
    </item>
    <item>
      <title>Re: Unable to access metrics from Driver node on localhost:4040</title>
      <link>https://community.databricks.com/t5/data-engineering/unable-to-access-metrics-from-driver-node-on-localhost-4040/m-p/133237#M49753</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/187597"&gt;@vishal_balaji&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;You're following guides that were prepared for OSS Apache Spark. For sure localhost won't work in this case because in Databricks all compute is cloud-based.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Please follow below guide how to configure it properly on databricks:&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.databricks.com/t5/technical-blog/databricks-observability-using-grafana-and-prometheus/ba-p/96849" target="_blank"&gt;Databricks Observability using Grafana and Prometheus&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 29 Sep 2025 10:04:34 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unable-to-access-metrics-from-driver-node-on-localhost-4040/m-p/133237#M49753</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2025-09-29T10:04:34Z</dc:date>
    </item>
    <item>
      <title>Re: Unable to access metrics from Driver node on localhost:4040</title>
      <link>https://community.databricks.com/t5/data-engineering/unable-to-access-metrics-from-driver-node-on-localhost-4040/m-p/133238#M49754</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/110502"&gt;@szymon_dybczak&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;Thanks for the quick response. We initially tried making Pushgateway work, but this seems to be designed for tracking metrics related to ephemeral batch jobs.&lt;/P&gt;&lt;P&gt;We are trying to track metrics for streaming jobs, which the pushgateway is not able to handle because it stores all metrics in memory and quickly runs out of memory in the host machine.&lt;/P&gt;</description>
      <pubDate>Mon, 29 Sep 2025 10:31:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unable-to-access-metrics-from-driver-node-on-localhost-4040/m-p/133238#M49754</guid>
      <dc:creator>vishal_balaji</dc:creator>
      <dc:date>2025-09-29T10:31:56Z</dc:date>
    </item>
  </channel>
</rss>

