Databricks Community

vishal_balaji · ‎09-29-2025

Greetings,

I am trying to setup monitoring in Grafana for all my databricks clusters

I have added 2 things as part of this

Under Compute > Configuration > Advanced > Spark > Spark Config, I have added
spark.ui.prometheus.enabled true

Under init_scripts, I have this script

#!/bin/bash

cat > /databricks/spark/conf/jmxCollector.yaml <<EOF

lowercaseOutputName: false

lowercaseOutputLabelNames: false

whitelistObjectNames: ["*:*"]

EOF

cat >> /databricks/spark/conf/metrics.properties <<EOF

# Enable Prometheus for all instances by class name

driver.sink.prometheusServlet.class=org.apache.spark.metrics.sink.PrometheusServlet

executor.sink.prometheusServlet.class=org.apache.spark.metrics.sink.PrometheusServlet

driver.sink.prometheusServlet.path=/metrics/prometheus

executor.sink.prometheusServlet.path=/metrics/executor/prometheus

master.sink.prometheusServlet.path=/metrics/master/prometheus

applications.sink.prometheusServlet.path=/metrics/applications/prometheus

*.sink.jmx.class=org.apache.spark.metrics.sink.JmxSink

*.source.jvm.class=org.apache.spark.metrics.source.JvmSource

# *.sink.console.class=org.apache.spark.metrics.sink.ConsoleSink

# *.sink.console.period=120

# driver.sink.console.unit=seconds

EOF

However I am not able to access these metrics on localhost:4040 when I try to connect to the cluster. I tried doing

curl http://localhost:4040/metrics/prometheus/

gives

curl: (7) Failed to connect to localhost port 4040 after 1 ms: Couldn't connect to server

Directly connecting to Driver IP gives an empty response

curl -v http://10.4.86.136:37479/metrics/prometheus

* Connected to 10.4.86.136 (10.4.86.136) port 37479

> GET /metrics/prometheus HTTP/1.1

> Host: 10.4.86.136:37479

> User-Agent: curl/8.5.0

> Accept: */*

< * Empty

reply from server 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0

* Closing connection

curl: (52) Empty reply from server

Am I configuring something wrong here? Why is the endpoint not reachable via localhost:4040 like it's mentioned in the docs - https://spark.apache.org/docs/latest/monitoring.html#metrics
Why am I getting an empty response from DRIVER_IP/metrics/prometheus? I got to try that from here - https://stackoverflow.com/questions/70989641/spark-executor-metrics-dont-reach-prometheus-sink
If I have to access this only through the DRIVER_IP, how do I get access to this within the context of the init_script?

szymon_dybczak · ‎09-29-2025

Hi @vishal_balaji ,

You're following guides that were prepared for OSS Apache Spark. For sure localhost won't work in this case because in Databricks all compute is cloud-based.

Please follow below guide how to configure it properly on databricks:

Databricks Observability using Grafana and Prometheus

vishal_balaji · ‎09-29-2025

Hi @szymon_dybczak ,

Thanks for the quick response. We initially tried making Pushgateway work, but this seems to be designed for tracking metrics related to ephemeral batch jobs.

We are trying to track metrics for streaming jobs, which the pushgateway is not able to handle because it stores all metrics in memory and quickly runs out of memory in the host machine.

Databricks Community

Unable to access metrics from Driver node on localhost:4040

Join Us as a Local Community Builder!

Free Edition Hackathon

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples

Level Up with Databricks Specialist Sessions

🌟 Community Pulse: Your Weekly Roundup! November 07 – 13, 2025

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐