Unable to access metrics from Driver node on localhost:4040
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-29-2025 01:55 AM
Greetings,
I am trying to setup monitoring in Grafana for all my databricks clusters
I have added 2 things as part of this
Under Compute > Configuration > Advanced > Spark > Spark Config, I have added
spark.ui.prometheus.enabled true
Under init_scripts, I have this script
* Connected to 10.4.86.136 (10.4.86.136) port 37479
> GET /metrics/prometheus HTTP/1.1
> Host: 10.4.86.136:37479
> User-Agent: curl/8.5.0
> Accept: */*
< * Empty
reply from server 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
* Closing connection
curl: (52) Empty reply from server
- Am I configuring something wrong here? Why is the endpoint not reachable via localhost:4040 like it's mentioned in the docs - https://spark.apache.org/docs/latest/monitoring.html#metrics
- Why am I getting an empty response from DRIVER_IP/metrics/prometheus? I got to try that from here - https://stackoverflow.com/questions/70989641/spark-executor-metrics-dont-reach-prometheus-sink
- If I have to access this only through the DRIVER_IP, how do I get access to this within the context of the init_script?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-29-2025 03:04 AM
Hi @vishal_balaji ,
You're following guides that were prepared for OSS Apache Spark. For sure localhost won't work in this case because in Databricks all compute is cloud-based.
Please follow below guide how to configure it properly on databricks:
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-29-2025 03:31 AM
Hi @szymon_dybczak ,
Thanks for the quick response. We initially tried making Pushgateway work, but this seems to be designed for tracking metrics related to ephemeral batch jobs.
We are trying to track metrics for streaming jobs, which the pushgateway is not able to handle because it stores all metrics in memory and quickly runs out of memory in the host machine.