Hi @Shahe how are you?
The standard way of doing so is to first configure your spark to enable Prometheus using:
spark.ui.prometheus.enabled true
spark.metrics.namespace <app_name>
We recommend replacing <app_name> with job names for job clusters.
Then, create an init script and attach to the cluster
dbutils.fs.put("dbfs:/dlyle/install-prometheus.sh",
"""
#!/bin/bash
cat <<EOF > /databricks/spark/conf/metrics.properties
*.sink.prometheusServlet.class=org.apache.spark.metrics.sink.PrometheusServlet
*.sink.prometheusServlet.path=/metrics/prometheus
master.sink.prometheusServlet.path=/metrics/master/prometheus
applications.sink.prometheusServlet.path=/metrics/applications/prometheus
EOF
cat >/databricks/driver/conf/00-custom-spark.conf <<EOF
[driver] {
spark.sql.streaming.metricsEnabled = "true"
spark.metrics.appStatusSource.enabled = "true"
}
EOF
""", True)
Confirm the driver port by running:
spark.conf.get("spark.ui.port")
Make a note of the driver port. You'll need this when setting up the scrape target.
-
Generate a personal access token as shown here
-
Add the following to your prometheus_values.yaml
extraScrapeConfigs: |
- job_name: '<cluster_name>'
metrics_path: /driver-proxy-api/o/0/<cluster_id>/<spark_ui_port>/metrics/prometheus/
static_configs:
- targets:
- <workspace_url>
authorization:
# Sets the authentication type of the request.
type: Bearer
credentials: <personal access token>
Update your helm installation: helm upgrade <installation_name - i.e. dlyle-prometheus> -f prometheus_values.yaml -n <namespace - i.e. prometheus-monitoring> --create-namespace prometheus-community/prometheus
Prometheus server has a sidecar container that should automatically update the config. You can also force a restart: kubectl rollout restart deployment <installation_name> -n <namespace>
At this point, you should be able to see your cluster as a scrape target in Prometheus server.
Follow this link for more refference:
https://docs.databricks.com/en/machine-learning/model-serving/metrics-export-serving-endpoint.html