Azure Databricks Metrics to Prometheus?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-04-2024 01:43 AM
What is the best method to expose Azure Databricks metrics to Prometheus specifically? And is it possible to get the underlying Spark metrics also? All I can see clearly defined in the documentation is the serving endpoint metrics:
https://learn.microsoft.com/en-us/azure/databricks/machine-learning/model-serving/metrics-export-ser...
Please advise, thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-04-2024 08:46 AM
Hi @Shahe how are you?
The standard way of doing so is to first configure your spark to enable Prometheus using:
spark.ui.prometheus.enabled true
spark.metrics.namespace <app_name>
We recommend replacing <app_name> with job names for job clusters.
Then, create an init script and attach to the cluster
dbutils.fs.put("dbfs:/dlyle/install-prometheus.sh",
"""
#!/bin/bash
cat <<EOF > /databricks/spark/conf/metrics.properties
*.sink.prometheusServlet.class=org.apache.spark.metrics.sink.PrometheusServlet
*.sink.prometheusServlet.path=/metrics/prometheus
master.sink.prometheusServlet.path=/metrics/master/prometheus
applications.sink.prometheusServlet.path=/metrics/applications/prometheus
EOF
cat >/databricks/driver/conf/00-custom-spark.conf <<EOF
[driver] {
spark.sql.streaming.metricsEnabled = "true"
spark.metrics.appStatusSource.enabled = "true"
}
EOF
""", True)
Confirm the driver port by running:
spark.conf.get("spark.ui.port")
Make a note of the driver port. You'll need this when setting up the scrape target.
-
Generate a personal access token as shown here
-
Add the following to your prometheus_values.yaml
extraScrapeConfigs: |
- job_name: '<cluster_name>'
metrics_path: /driver-proxy-api/o/0/<cluster_id>/<spark_ui_port>/metrics/prometheus/
static_configs:
- targets:
- <workspace_url>
authorization:
# Sets the authentication type of the request.
type: Bearer
credentials: <personal access token>
Update your helm installation: helm upgrade <installation_name - i.e. dlyle-prometheus> -f prometheus_values.yaml -n <namespace - i.e. prometheus-monitoring> --create-namespace prometheus-community/prometheus
Prometheus server has a sidecar container that should automatically update the config. You can also force a restart:
kubectl rollout restart deployment <installation_name> -n <namespace>
At this point, you should be able to see your cluster as a scrape target in Prometheus server.
Follow this link for more refference:
https://docs.databricks.com/en/machine-learning/model-serving/metrics-export-serving-endpoint.html
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-12-2024 01:48 PM
Hello
I don't have databricks running as pod in an aks cluster.. It's working on azure as saas.. What should I do the export the metrics to prometheus?

