Databricks Community

Sifflet · ‎08-11-2025

Hi all,

We’ve been improving our data quality monitoring for several pipelines, but we keep running into the same problem — too many alerts, most of which aren’t actionable. Over time, it becomes harder to trust them.

Right now, we’re doing:

Freshness checks
Volume anomaly detection
Schema change alerts
Some data lineage tracking

Recently, we started using Sifflet to automate checks and add context, which has already reduced alert fatigue quite a bit. But I’d love to hear what others are doing to strike the right balance between coverage and noise.

How do you configure your checks so alerts are both accurate and actionable?

WiliamRosa · ‎08-22-2025

Hi @Sifflet,
This is genuinely complex—and while you mentioned alerting and monitoring, in my experience the biggest lever to reduce noise is to treat problems at the source (i.e., in the transformation layer). Make the transformations enforce the core data rules, and let monitoring validate those rules rather than flag every blip. The key is balance: set guardrails that are meaningful (prioritize business-critical expectations), use rolling baselines and grace periods to avoid flapping, add severity levels, and suppress downstream alerts when an upstream dependency is already failing.

I wrote a short piece on data quality with PySpark + Great Expectations on Databricks. I use this approach day-to-day and persist results to Delta tables to power a lightweight, optimized data-quality dashboard (pass/fail rates, trend lines, ownership). If helpful, here’s the link:
https://community.databricks.com/t5/knowledge-sharing-hub/data-quality-with-pyspark-and-great-expect...

Hope that helps!

Wiliam Rosa
Data Engineer | Machine Learning Engineer
LinkedIn: linkedin.com/in/wiliamrosa