10-14-2021 06:42 AM
Hi, everyone. I just recently started using Databricks on Azure so my question is probably very basic but I am really stuck right now.
I need to capture some streaming metrics (number of input rows and their time) so I tried using the Spark Rest Api , however I get the following error: "no streaming listener attached to Databricks Shell". I tried different solutions I have seen in videos or tutorials but none have worked so far (this only happens when I try to get the stream statistics, if I use the API for jobs or stages, I get the json as expected).
Here is the code I am trying to run:
import requests
import json
driverIp = spark.conf.get('spark.driver.host')
port = spark.conf.get("spark.ui.port")
temp_url = F"http://{driverIp}:{port}/api/v1/applications"
temp_r = request.get(temp_url, timeout=10.0)
content_r = json.load(temp_r.content)
app_id = content_r[0][ïd"]
url = F"http://{driverIp}:{port}/api/v1/applications/{app_id}/streaming/statistics"
r = requests.get(url)
print(r.content)
I understand that I should attach the streaming listener in order to get the metrics I need but I still did not understand how to implement it in the code. Could someone please help me on this issue?
Thanks a lot in advance
10-20-2021 04:23 PM
Hi @Roberto Baldrez , you will need to add the below configs to the cluster
spark.sql.streaming.metricsEnabled true
*.sink.servlet.class org.apache.spark.metrics.sink.MetricsServlet
*.sink.servlet.path /metrics/json
master.sink.servlet.path /metrics/master/json
applications.sink.servlet.path /metrics/applications/json
URL will change to "http://<driverIP>:<port>/metrics/json/" the one you mentioned is for DStream application
note: This gives limited streaming metrics. If you need all metrics you will need to add metrics sink to the cluster.
More info
10-15-2021 09:28 AM
Hi @Roberto Baldrez - My name is Piper and I'm one of the community moderators. Thanks for your question. Let's give it a bit to see what the community says. Thank you for your patience.
10-20-2021 04:23 PM
Hi @Roberto Baldrez , you will need to add the below configs to the cluster
spark.sql.streaming.metricsEnabled true
*.sink.servlet.class org.apache.spark.metrics.sink.MetricsServlet
*.sink.servlet.path /metrics/json
master.sink.servlet.path /metrics/master/json
applications.sink.servlet.path /metrics/applications/json
URL will change to "http://<driverIP>:<port>/metrics/json/" the one you mentioned is for DStream application
note: This gives limited streaming metrics. If you need all metrics you will need to add metrics sink to the cluster.
More info
02-21-2024 09:08 AM
Could you please tell us where is the configs to the cluster? I cannot find it. Thanks.
10-26-2021 04:55 PM
hi @Roberto Baldrez ,
if you think that @Gaurav Rupnar solved your question, then please select it as best response to it can be moved to the top of the topic and it will help more users in the future.
Thank you
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group