cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results for 
Search instead for 
Did you mean: 

Best practices for reducing noise in data quality monitoring?

Sifflet
New Contributor II

Hi all,

We’ve been improving our data quality monitoring for several pipelines, but we keep running into the same problem — too many alerts, most of which aren’t actionable. Over time, it becomes harder to trust them.

Right now, we’re doing:

  • Freshness checks

  • Volume anomaly detection

  • Schema change alerts

  • Some data lineage tracking

Recently, we started using Sifflet to automate checks and add context, which has already reduced alert fatigue quite a bit. But I’d love to hear what others are doing to strike the right balance between coverage and noise.

How do you configure your checks so alerts are both accurate and actionable?

1 REPLY 1

WiliamRosa
New Contributor III

Hi @Sifflet
This is genuinely complex—and while you mentioned alerting and monitoring, in my experience the biggest lever to reduce noise is to treat problems at the source (i.e., in the transformation layer). Make the transformations enforce the core data rules, and let monitoring validate those rules rather than flag every blip. The key is balance: set guardrails that are meaningful (prioritize business-critical expectations), use rolling baselines and grace periods to avoid flapping, add severity levels, and suppress downstream alerts when an upstream dependency is already failing.

I wrote a short piece on data quality with PySpark + Great Expectations on Databricks. I use this approach day-to-day and persist results to Delta tables to power a lightweight, optimized data-quality dashboard (pass/fail rates, trend lines, ownership). If helpful, here’s the link:
https://community.databricks.com/t5/knowledge-sharing-hub/data-quality-with-pyspark-and-great-expect...

Hope that helps!

Wiliam Rosa
Data Engineer | Machine Learning Engineer
LinkedIn: linkedin.com/in/wiliamrosa

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now