- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-26-2022 03:34 PM
We have a spark streaming application written in Pyspark that we'd like to monitor with Datadog. By default, datadog collects a couple of streaming metrics like 'spark.structured_streaming.processing_rate' and 'spark.structured_streaming.latency'. However, after setting 'logs_enabled: true' and 'spark.sql.streaming.metricsEnabled = true' in the cluster init script. We're still unable to see any streaming metrics in datadog. Upon some research, it seems like we need to implement a new class of 'StreamingQueryListener' from spark streaming to make this work. Is this assumption correct? If so, is it possible to implement this in Python instead of Scala? I haven't seen any Python implementation anywhere. I would appreciate it if someone can point me to any example if it's possible. Any help would be appreciated!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-27-2022 08:48 AM
@Liz Zhang , Please refer to the below documentation contain pyspark implementation of streamingQueryListener
https://www.databricks.com/blog/2022/05/27/how-to-monitor-streaming-queries-in-pyspark.html