Greetings,
I am trying to setup monitoring in Grafana for all my databricks clusters
I have added 2 things as part of this
Under Compute > Configuration > Advanced > Spark > Spark Config, I have added
spark.ui.prometheus.enabled true
Under init_scripts, I have this script
#!/bin/bash
cat > /databricks/spark/conf/jmxCollector.yaml <<EOF
lowercaseOutputName: false
lowercaseOutputLabelNames: false
whitelistObjectNames: ["*:*"]
EOF
cat >> /databricks/spark/conf/metrics.properties <<EOF
# Enable Prometheus for all instances by class name
driver.sink.prometheusServlet.class=org.apache.spark.metrics.sink.PrometheusServlet
executor.sink.prometheusServlet.class=org.apache.spark.metrics.sink.PrometheusServlet
driver.sink.prometheusServlet.path=/metrics/prometheus
executor.sink.prometheusServlet.path=/metrics/executor/prometheus
master.sink.prometheusServlet.path=/metrics/master/prometheus
applications.sink.prometheusServlet.path=/metrics/applications/prometheus
*.sink.jmx.class=org.apache.spark.metrics.sink.JmxSink
*.source.jvm.class=org.apache.spark.metrics.source.JvmSource
# *.sink.console.class=org.apache.spark.metrics.sink.ConsoleSink
# *.sink.console.period=120
# driver.sink.console.unit=seconds
EOF
However I am not able to access these metrics on localhost:4040 when I try to connect to the cluster. I tried doing
gives
curl: (7) Failed to connect to localhost port 4040 after 1 ms: Couldn't connect to server
Directly connecting to Driver IP gives an empty response
* Connected to 10.4.86.136 (10.4.86.136) port 37479
> GET /metrics/prometheus HTTP/1.1
> Host: 10.4.86.136:37479
> User-Agent: curl/8.5.0
> Accept: */*
< * Empty
reply from server 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
* Closing connection
curl: (52) Empty reply from server
- Am I configuring something wrong here? Why is the endpoint not reachable via localhost:4040 like it's mentioned in the docs - https://spark.apache.org/docs/latest/monitoring.html#metrics
- Why am I getting an empty response from DRIVER_IP/metrics/prometheus? I got to try that from here - https://stackoverflow.com/questions/70989641/spark-executor-metrics-dont-reach-prometheus-sink
- If I have to access this only through the DRIVER_IP, how do I get access to this within the context of the init_script?