Lakehouse Monitoring & Expectations
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-05-2024 12:35 PM
Dears
Has anyone successfully used at scale the lakehouse monitoring & expectations features together to measure data quality of data tables - example, to conduct freshness checks, consistency checks etc.
Appreciate if you could share the lessons learnt, best practices. I would have a need to execute several consistency checks against several tables, and have those red/green lights be updated against them ((as shown during the annual conference 🙂 ).
- Labels:
-
Delta Lake
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-11-2024 04:40 PM
Hello @noorbasha534
Thank you for reaching out and for your patience with this reply; below are some of the best practices:
- Monitor Data, Not Just Processes: Focus on monitoring the quality of your data, not just the processes that handle it. This approach helps catch issues early in the data pipeline.
- Set Expectation Rules: Expectations can help manage data quality, especially when using Delta Live Tables (DLT). You can drop, warn, or quarantine rows that violate expectations or fail the pipeline altogether.
- Leverage Unity Catalog Integration: Since Lakehouse Monitoring is built on Unity Catalog, it can track quality alongside governance, building toward a self-serve data platform.
- Utilize Automated Profiling: Use the automated profiling feature for any Delta table in Unity Catalog to quickly identify potential issues across your entire data estate.
- Implement Proactive Alerting: Use the Expectations feature to set up notifications for quality issues as they arise, shifting from reactive to proactive monitoring.
- Customize Dashboards: Leverage Lakeview dashboard capabilities to create custom visualizations and collaborate across workspaces, teams, and stakeholders.
- Monitor Throughout the Data Lifecycle: Apply monitoring techniques at every step of the medallion architecture (bronze-silver-gold) to ensure data quality throughout the entire data lifecycle.
- Leverage Custom Metrics: Incorporate custom metrics tailored to your specific use case to gain more profound, more relevant insights, performance, and data quality.
Here are some articles and videos about Lakehouse monitoring for your review:
- Navigating the Waters of Lakehouse Monitoring and Observability by eeezee (Databricks) - https://community.databricks.com/t5/technical-blog/navigating-the-waters-of-lakehouse-monitoring-and...
- Ensuring Quality Forecasts with Databricks Lakehouse Monitoring by Peter Park - https://www.databricks.com/blog/ensuring-quality-forecasts-databricks-lakehouse-monitoring
- Lakehouse Monitoring GA: Profiling, Diagnosing, and Enforcing Data Quality with Intelligence -
I hope this helps!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-29-2025 04:19 PM - edited 01-29-2025 04:19 PM
Not sure if you are still looking for the same. Here is a medium article - https://piethein.medium.com/data-quality-within-lakehouses-0c9417ce0487 that you can see the detailed implementation

