Forward Spark structured streaming metrics to Datadog

Lizzz
New Contributor II

We have a spark streaming application written in Pyspark that we'd like to monitor with Datadog. By default, datadog collects a couple of streaming metrics like 'spark.structured_streaming.processing_rate' and 'spark.structured_streaming.latency'. However, after setting 'logs_enabled: true' and 'spark.sql.streaming.metricsEnabled = true' in the cluster init script. We're still unable to see any streaming metrics in datadog. Upon some research, it seems like we need to implement a new class of 'StreamingQueryListener' from spark streaming to make this work. Is this assumption correct? If so, is it possible to implement this in Python instead of Scala? I haven't seen any Python implementation anywhere. I would appreciate it if someone can point me to any example if it's possible. Any help would be appreciated!

shan_chandra
Databricks Employee
Databricks Employee

@Liz Zhang​ , Please refer to the below documentation contain pyspark implementation of streamingQueryListener

https://www.databricks.com/blog/2022/05/27/how-to-monitor-streaming-queries-in-pyspark.html

View solution in original post