cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

REST API for Stream Monitoring

Baldrez
New Contributor II

Hi, everyone. I just recently started using Databricks on Azure so my question is probably very basic but I am really stuck right now.

I need to capture some streaming metrics (number of input rows and their time) so I tried using the Spark Rest Api , however I get the following error: "no streaming listener attached to Databricks Shell". I tried different solutions I have seen in videos or tutorials but none have worked so far (this only happens when I try to get the stream statistics, if I use the API for jobs or stages, I get the json as expected).

Here is the code I am trying to run:

import requests
import json
 
driverIp = spark.conf.get('spark.driver.host')
port = spark.conf.get("spark.ui.port")
temp_url = F"http://{driverIp}:{port}/api/v1/applications"
temp_r = request.get(temp_url, timeout=10.0)
content_r = json.load(temp_r.content)
app_id = content_r[0][รฏd"]
 
url = F"http://{driverIp}:{port}/api/v1/applications/{app_id}/streaming/statistics"
r = requests.get(url)
print(r.content)

I understand that I should attach the streaming listener in order to get the metrics I need but I still did not understand how to implement it in the code. Could someone please help me on this issue?

Thanks a lot in advance

1 ACCEPTED SOLUTION

Accepted Solutions

User16763506477
Contributor III

Hi @Roberto Baldrezโ€‹ , you will need to add the below configs to the cluster

spark.sql.streaming.metricsEnabled true
*.sink.servlet.class org.apache.spark.metrics.sink.MetricsServlet
*.sink.servlet.path /metrics/json
master.sink.servlet.path /metrics/master/json
applications.sink.servlet.path /metrics/applications/json

URL will change to "http://<driverIP>:<port>/metrics/json/" the one you mentioned is for DStream application

note: This gives limited streaming metrics. If you need all metrics you will need to add metrics sink to the cluster.

More info

View solution in original post

4 REPLIES 4

Anonymous
Not applicable

Hi @Roberto Baldrezโ€‹ - My name is Piper and I'm one of the community moderators. Thanks for your question. Let's give it a bit to see what the community says. Thank you for your patience.

User16763506477
Contributor III

Hi @Roberto Baldrezโ€‹ , you will need to add the below configs to the cluster

spark.sql.streaming.metricsEnabled true
*.sink.servlet.class org.apache.spark.metrics.sink.MetricsServlet
*.sink.servlet.path /metrics/json
master.sink.servlet.path /metrics/master/json
applications.sink.servlet.path /metrics/applications/json

URL will change to "http://<driverIP>:<port>/metrics/json/" the one you mentioned is for DStream application

note: This gives limited streaming metrics. If you need all metrics you will need to add metrics sink to the cluster.

More info

Could you please tell us where is the configs to the cluster? I cannot find it. Thanks.

jose_gonzalez
Databricks Employee
Databricks Employee

hi @Roberto Baldrezโ€‹ ,

if you think that @Gaurav Rupnarโ€‹ solved your question, then please select it as best response to it can be moved to the top of the topic and it will help more users in the future.

Thank you

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group