cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Azure Databricks Metrics to Prometheus?

Shahe
New Contributor

What is the best method to expose Azure Databricks metrics to Prometheus specifically? And is it possible to get the underlying Spark metrics also? All I can see clearly defined in the documentation is the serving endpoint metrics:

https://learn.microsoft.com/en-us/azure/databricks/machine-learning/model-serving/metrics-export-ser...

Please advise, thanks.

2 REPLIES 2

anardinelli
Databricks Employee
Databricks Employee

Hi @Shahe how are you?

The standard way of doing so is to first configure your spark to enable Prometheus using:

spark.ui.prometheus.enabled true
spark.metrics.namespace <app_name>

We recommend replacing <app_name> with job names for job clusters.

Then, create an init script and attach to the cluster

dbutils.fs.put("dbfs:/dlyle/install-prometheus.sh",
"""
#!/bin/bash
cat <<EOF > /databricks/spark/conf/metrics.properties
*.sink.prometheusServlet.class=org.apache.spark.metrics.sink.PrometheusServlet
*.sink.prometheusServlet.path=/metrics/prometheus
master.sink.prometheusServlet.path=/metrics/master/prometheus
applications.sink.prometheusServlet.path=/metrics/applications/prometheus
EOF
cat >/databricks/driver/conf/00-custom-spark.conf <<EOF
[driver] {  
  spark.sql.streaming.metricsEnabled = "true"
  spark.metrics.appStatusSource.enabled = "true"
}
EOF
""", True)

Confirm the driver port by running: 

spark.conf.get("spark.ui.port")

Make a note of the driver port. You'll need this when setting up the scrape target.

  1. Generate a personal access token as shown here

  2. Add the following to your prometheus_values.yaml

extraScrapeConfigs: |
   - job_name: '<cluster_name>'
     metrics_path: /driver-proxy-api/o/0/<cluster_id>/<spark_ui_port>/metrics/prometheus/
     static_configs:
       - targets:
         - <workspace_url>
     authorization:
     # Sets the authentication type of the request.
       type: Bearer
       credentials: <personal access token>

Update your helm installation: helm upgrade <installation_name - i.e. dlyle-prometheus> -f prometheus_values.yaml -n <namespace - i.e. prometheus-monitoring> --create-namespace prometheus-community/prometheus

Prometheus server has a sidecar container that should automatically update the config. You can also force a restart: kubectl rollout restart deployment <installation_name> -n <namespace>

At this point, you should be able to see your cluster as a scrape target in Prometheus server.

Follow this link for more refference:

https://docs.databricks.com/en/machine-learning/model-serving/metrics-export-serving-endpoint.html

 

DanielB
New Contributor II

Hello

I don't have databricks running as pod in an aks cluster.. It's working on azure as saas.. What should I do the export the metrics to prometheus?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group