Hi @Sifflet,
This is genuinely complex—and while you mentioned alerting and monitoring, in my experience the biggest lever to reduce noise is to treat problems at the source (i.e., in the transformation layer). Make the transformations enforce the core data rules, and let monitoring validate those rules rather than flag every blip. The key is balance: set guardrails that are meaningful (prioritize business-critical expectations), use rolling baselines and grace periods to avoid flapping, add severity levels, and suppress downstream alerts when an upstream dependency is already failing.
I wrote a short piece on data quality with PySpark + Great Expectations on Databricks. I use this approach day-to-day and persist results to Delta tables to power a lightweight, optimized data-quality dashboard (pass/fail rates, trend lines, ownership). If helpful, here’s the link:
https://community.databricks.com/t5/knowledge-sharing-hub/data-quality-with-pyspark-and-great-expect...
Hope that helps!
Wiliam Rosa
Data Engineer | Machine Learning Engineer
LinkedIn: linkedin.com/in/wiliamrosa