cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

YFL
by New Contributor III
  • 6838 Views
  • 11 replies
  • 6 kudos

Resolved! When delta is a streaming source, how can we get the consumer lag?

Hi, I want to keep track of the streaming lag from the source table, which is a delta table. I see that in query progress logs, there is some information about the last version and the last file in the version for the end offset, but this don't give ...

  • 6838 Views
  • 11 replies
  • 6 kudos
Latest Reply
Anonymous
Not applicable
  • 6 kudos

Hey @Yerachmiel Feltzman​ I hope all is well.Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.Thanks!

  • 6 kudos
10 More Replies
SaravananPalani
by New Contributor II
  • 24268 Views
  • 8 replies
  • 9 kudos

Is there any way to monitor the CPU, disk and memory usage of a cluster while a job is running?

I am looking for something preferably similar to Windows task manager which we can use for monitoring the CPU, memory and disk usage for local desktop.

  • 24268 Views
  • 8 replies
  • 9 kudos
Latest Reply
hitech88
New Contributor II
  • 9 kudos

Some important info to look in Gangalia UI in CPU, memory and server load charts to spot the problem:CPU chart :User %Idle %High percentage of user % indicates heavy CPU usage in the cluster.Memory chart : Use %Free %Swap % If you see purple line ove...

  • 9 kudos
7 More Replies
Anonymous
by Not applicable
  • 1243 Views
  • 1 replies
  • 0 kudos

Monitoring

Are there any event streams that are or could be exposed in AWS (such as Cloudwatch Eventbridge events or SNS messages? In particular I'm interested in events that detail jobs being run. The use case here would be for monitoring jobs from our web app...

  • 1243 Views
  • 1 replies
  • 0 kudos
Latest Reply
jessykoo32
New Contributor II
  • 0 kudos

Yes, there are several event streams in AWS that can be used to monitor jobs being run. Your Texas BenefitsCloudWatch Events: This service allows you to set up rules to automatically trigger actions in response to specific events in other AWS service...

  • 0 kudos
Archana
by New Contributor
  • 4836 Views
  • 2 replies
  • 0 kudos

What are the metrics to be considered for monitoring the Databricks

I am very new to Databricks, just setting up with things. I would like to explore various features of Databricks and start playing around with the environment.I am curious to know what are the metrics should be considered for monitoring the complete ...

  • 4836 Views
  • 2 replies
  • 0 kudos
Latest Reply
jessykoo32
New Contributor II
  • 0 kudos

Databricks is a powerful platform for data engineering, machine learning, and analytics, and it is important to monitor the performance and health of your Databricks environment to ensure that it is running smoothly.Here are a few key metrics that yo...

  • 0 kudos
1 More Replies
Tahseen0354
by Valued Contributor
  • 4532 Views
  • 2 replies
  • 4 kudos

Resolved! How do I track databricks cluster users ?

Hi, is there a way to find out/monitor which users has used my cluster, how long and how many times in an azure databricks workspace ?

  • 4532 Views
  • 2 replies
  • 4 kudos
Latest Reply
youssefmrini
Databricks Employee
  • 4 kudos

Hello, You can activate Audit logs ( More specifically Cluster logs) https://learn.microsoft.com/en-us/azure/databricks/administration-guide/account-settings/azure-diagnostic-logs It can be very helpful to track all the metrics.

  • 4 kudos
1 More Replies
Lizzz
by New Contributor II
  • 3373 Views
  • 1 replies
  • 3 kudos

Resolved! Forward Spark structured streaming metrics to Datadog

We have a spark streaming application written in Pyspark that we'd like to monitor with Datadog. By default, datadog collects a couple of streaming metrics like 'spark.structured_streaming.processing_rate' and 'spark.structured_streaming.latency'. Ho...

  • 3373 Views
  • 1 replies
  • 3 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 3 kudos

@Liz Zhang​ , Please refer to the below documentation contain pyspark implementation of streamingQueryListener https://www.databricks.com/blog/2022/05/27/how-to-monitor-streaming-queries-in-pyspark.html

  • 3 kudos
ivanychev
by Contributor II
  • 1289 Views
  • 0 replies
  • 1 kudos

How to enable remote JMX monitoring in Databricks?

Adding these optionsEXTRA_JAVA_OPTIONS = ( '-Dcom.sun.management.jmxremote.port=9999', '-Dcom.sun.management.jmxremote.authenticate=false', '-Dcom.sun.management.jmxremote.ssl=false', )is enough in vanilla Apache Spark, but apparently it ...

  • 1289 Views
  • 0 replies
  • 1 kudos
TyronZerafa
by New Contributor II
  • 1724 Views
  • 0 replies
  • 2 kudos

Integrating with Prometheus

How can I integrate Databricks clusters with Prometheus? I tried adding the following Spark property to my cluster but cannot find the Prometheus metrics endpoints. Any thoughts? spark.ui.prometheus.enabled = true

  • 1724 Views
  • 0 replies
  • 2 kudos
Labels