09-15-2023 04:43 AM
Hi Everyone,
can someone suggest me to select best native job monitoring tool available in Databricks for fulfill my need;
we need to monitor the following:
Number of failed jobs and its name : for last 24 hours
Table that are not getting data
Latest ingested timestamp
table ingest rate, how much data is ingested
table ingest lag, is a stream job further behind than expected
table size, size of the current table being ingested into
query runtime : the time a query has been running
Thanks in advance
09-15-2023 08:35 AM
@Retired_modI an new to this could you please help me to understand how we can achieve all those . Is databricks job API will help me to achieve this?
09-15-2023 08:35 AM
@Retired_modI an new to this could you please help me to understand how we can achieve all those . Is databricks job API will help me to achieve this?
09-15-2023 08:59 AM
Can someone help suggest me which monitoring tool helps me and how we can achieve.
09-17-2023 08:02 AM
let me go through these one by one:
Number of failed jobs and its name : for last 24 hours
[BA] Your best bet will be to use the upcoming system tables integration. This is in preview, I believe. The general idea is that you will get a Delta table with runs and their statuses. For now, you can also use the Job Runs page (Workflows > Job Runs), this will show you job runs and their failures (also accessible by API)
Table that are not getting data
[BA] Lakehouse Monitoring is the way to go! Also in Preview, I believe.
Latest ingested timestamp
table ingest rate, how much data is ingested
[BA] How are you ingesting data?
table ingest lag, is a stream job further behind than expected
[BA] We are working on this! You will get monitoring and alerting on streaming lag both in Structured Streaming and Delta Live Tables.
table size, size of the current table being ingested into
[BA] I'm not sure what you mean by this.
query runtime : the time a query has been running
[BA] What type of query are you interested in?
Thursday
You can use the databricks API to collect all required information..
https://docs.databricks.com/api/workspace/jobs/list
Thursday
You can use the databricks API to collect all required information..
https://docs.databricks.com/api/workspace/jobs/list
Load the output to a delta table.
Use the Databricks dashboards in displaying this data.. schedule the job for loading the databricks job details as required..
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group