Databricks Community

noorbasha534 · ‎10-05-2024

Dears

Has anyone successfully used at scale the lakehouse monitoring & expectations features together to measure data quality of data tables - example, to conduct freshness checks, consistency checks etc.

Appreciate if you could share the lessons learnt, best practices. I would have a need to execute several consistency checks against several tables, and have those red/green lights be updated against them ((as shown during the annual conference 🙂 ).

mmayorga · ‎12-11-2024

Hello @noorbasha534

Thank you for reaching out and for your patience with this reply; below are some of the best practices:

Monitor Data, Not Just Processes: Focus on monitoring the quality of your data, not just the processes that handle it. This approach helps catch issues early in the data pipeline.
Set Expectation Rules: Expectations can help manage data quality, especially when using Delta Live Tables (DLT). You can drop, warn, or quarantine rows that violate expectations or fail the pipeline altogether.
Leverage Unity Catalog Integration: Since Lakehouse Monitoring is built on Unity Catalog, it can track quality alongside governance, building toward a self-serve data platform.
Utilize Automated Profiling: Use the automated profiling feature for any Delta table in Unity Catalog to quickly identify potential issues across your entire data estate.
Implement Proactive Alerting: Use the Expectations feature to set up notifications for quality issues as they arise, shifting from reactive to proactive monitoring.
Customize Dashboards: Leverage Lakeview dashboard capabilities to create custom visualizations and collaborate across workspaces, teams, and stakeholders.
Monitor Throughout the Data Lifecycle: Apply monitoring techniques at every step of the medallion architecture (bronze-silver-gold) to ensure data quality throughout the entire data lifecycle.
Leverage Custom Metrics: Incorporate custom metrics tailored to your specific use case to gain more profound, more relevant insights, performance, and data quality.

Here are some articles and videos about Lakehouse monitoring for your review:

Navigating the Waters of Lakehouse Monitoring and Observability by eeezee (Databricks) - https://community.databricks.com/t5/technical-blog/navigating-the-waters-of-lakehouse-monitoring-and...
Ensuring Quality Forecasts with Databricks Lakehouse Monitoring by Peter Park - https://www.databricks.com/blog/ensuring-quality-forecasts-databricks-lakehouse-monitoring
Lakehouse Monitoring GA: Profiling, Diagnosing, and Enforcing Data Quality with Intelligence -

I hope this helps!

Satyadeepak · ‎01-29-2025

Not sure if you are still looking for the same. Here is a medium article - https://piethein.medium.com/data-quality-within-lakehouses-0c9417ce0487 that you can see the detailed implementation

Databricks Community

Lakehouse Monitoring & Expectations

Photos

Join Us as a Local Community Builder!

Business Intelligence in the Era of AI

🚀 Monthly Databricks Get Started Days – Accelerate Your Learning Journey! 🚀

Databricks Community Champion - March 2025 - Takuya Omi

Get Started With Lakehouse Architecture | Pass a quiz to earn your certificate completion.

Virtual Learning Festival: 9 April - 30 April