cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Azure Databricks Metrics to Prometheus?

Shahe
New Contributor

What is the best method to expose Azure Databricks metrics to Prometheus specifically? And is it possible to get the underlying Spark metrics also? All I can see clearly defined in the documentation is the serving endpoint metrics:

https://learn.microsoft.com/en-us/azure/databricks/machine-learning/model-serving/metrics-export-ser...

Please advise, thanks.

2 REPLIES 2

anardinelli
New Contributor III
New Contributor III

Hi @Shahe how are you?

The standard way of doing so is to first configure your spark to enable Prometheus using:

spark.ui.prometheus.enabled true
spark.metrics.namespace <app_name>

We recommend replacing <app_name> with job names for job clusters.

Then, create an init script and attach to the cluster

dbutils.fs.put("dbfs:/dlyle/install-prometheus.sh",
"""
#!/bin/bash
cat <<EOF > /databricks/spark/conf/metrics.properties
*.sink.prometheusServlet.class=org.apache.spark.metrics.sink.PrometheusServlet
*.sink.prometheusServlet.path=/metrics/prometheus
master.sink.prometheusServlet.path=/metrics/master/prometheus
applications.sink.prometheusServlet.path=/metrics/applications/prometheus
EOF
cat >/databricks/driver/conf/00-custom-spark.conf <<EOF
[driver] {  
  spark.sql.streaming.metricsEnabled = "true"
  spark.metrics.appStatusSource.enabled = "true"
}
EOF
""", True)

Confirm the driver port by running: 

spark.conf.get("spark.ui.port")

Make a note of the driver port. You'll need this when setting up the scrape target.

  1. Generate a personal access token as shown here

  2. Add the following to your prometheus_values.yaml

extraScrapeConfigs: |
   - job_name: '<cluster_name>'
     metrics_path: /driver-proxy-api/o/0/<cluster_id>/<spark_ui_port>/metrics/prometheus/
     static_configs:
       - targets:
         - <workspace_url>
     authorization:
     # Sets the authentication type of the request.
       type: Bearer
       credentials: <personal access token>

Update your helm installation: helm upgrade <installation_name - i.e. dlyle-prometheus> -f prometheus_values.yaml -n <namespace - i.e. prometheus-monitoring> --create-namespace prometheus-community/prometheus

Prometheus server has a sidecar container that should automatically update the config. You can also force a restart: kubectl rollout restart deployment <installation_name> -n <namespace>

At this point, you should be able to see your cluster as a scrape target in Prometheus server.

Follow this link for more refference:

https://docs.databricks.com/en/machine-learning/model-serving/metrics-export-serving-endpoint.html

 

DanielB
New Contributor II

Hello

I don't have databricks running as pod in an aks cluster.. It's working on azure as saas.. What should I do the export the metrics to prometheus?

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!