cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Azure Databricks Metrics to Prometheus?

Shahe
New Contributor

What is the best method to expose Azure Databricks metrics to Prometheus specifically? And is it possible to get the underlying Spark metrics also? All I can see clearly defined in the documentation is the serving endpoint metrics:

https://learn.microsoft.com/en-us/azure/databricks/machine-learning/model-serving/metrics-export-ser...

Please advise, thanks.

2 REPLIES 2

anardinelli
Databricks Employee
Databricks Employee

Hi @Shahe how are you?

The standard way of doing so is to first configure your spark to enable Prometheus using:

spark.ui.prometheus.enabled true
spark.metrics.namespace <app_name>

We recommend replacing <app_name> with job names for job clusters.

Then, create an init script and attach to the cluster

dbutils.fs.put("dbfs:/dlyle/install-prometheus.sh",
"""
#!/bin/bash
cat <<EOF > /databricks/spark/conf/metrics.properties
*.sink.prometheusServlet.class=org.apache.spark.metrics.sink.PrometheusServlet
*.sink.prometheusServlet.path=/metrics/prometheus
master.sink.prometheusServlet.path=/metrics/master/prometheus
applications.sink.prometheusServlet.path=/metrics/applications/prometheus
EOF
cat >/databricks/driver/conf/00-custom-spark.conf <<EOF
[driver] {  
  spark.sql.streaming.metricsEnabled = "true"
  spark.metrics.appStatusSource.enabled = "true"
}
EOF
""", True)

Confirm the driver port by running: 

spark.conf.get("spark.ui.port")

Make a note of the driver port. You'll need this when setting up the scrape target.

  1. Generate a personal access token as shown here

  2. Add the following to your prometheus_values.yaml

extraScrapeConfigs: |
   - job_name: '<cluster_name>'
     metrics_path: /driver-proxy-api/o/0/<cluster_id>/<spark_ui_port>/metrics/prometheus/
     static_configs:
       - targets:
         - <workspace_url>
     authorization:
     # Sets the authentication type of the request.
       type: Bearer
       credentials: <personal access token>

Update your helm installation: helm upgrade <installation_name - i.e. dlyle-prometheus> -f prometheus_values.yaml -n <namespace - i.e. prometheus-monitoring> --create-namespace prometheus-community/prometheus

Prometheus server has a sidecar container that should automatically update the config. You can also force a restart: kubectl rollout restart deployment <installation_name> -n <namespace>

At this point, you should be able to see your cluster as a scrape target in Prometheus server.

Follow this link for more refference:

https://docs.databricks.com/en/machine-learning/model-serving/metrics-export-serving-endpoint.html

 

DanielB
New Contributor II

Hello

I don't have databricks running as pod in an aks cluster.. It's working on azure as saas.. What should I do the export the metrics to prometheus?

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now