cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Lakehouse Monitoring & Expectations

noorbasha534
Contributor

Dears

Has anyone successfully used at scale the lakehouse monitoring & expectations features together to measure data quality of data tables - example, to conduct freshness checks, consistency checks etc.

Appreciate if you could share the lessons learnt, best practices. I would have a need to execute several consistency checks against several tables, and have those red/green lights be updated against them ((as shown during the annual conference 🙂 ).

1 REPLY 1

mmayorga
Databricks Employee
Databricks Employee

Hello @noorbasha534 

Thank you for reaching out and for your patience with this reply; below are some of the best practices:

  1. Monitor Data, Not Just Processes: Focus on monitoring the quality of your data, not just the processes that handle it. This approach helps catch issues early in the data pipeline.
  2. Set Expectation Rules: Expectations can help manage data quality, especially when using Delta Live Tables (DLT). You can drop, warn, or quarantine rows that violate expectations or fail the pipeline altogether.
  3. Leverage Unity Catalog Integration: Since Lakehouse Monitoring is built on Unity Catalog, it can track quality alongside governance, building toward a self-serve data platform.
  4. Utilize Automated Profiling: Use the automated profiling feature for any Delta table in Unity Catalog to quickly identify potential issues across your entire data estate.
  5. Implement Proactive Alerting: Use the Expectations feature to set up notifications for quality issues as they arise, shifting from reactive to proactive monitoring.
  6. Customize Dashboards: Leverage Lakeview dashboard capabilities to create custom visualizations and collaborate across workspaces, teams, and stakeholders.
  7. Monitor Throughout the Data Lifecycle: Apply monitoring techniques at every step of the medallion architecture (bronze-silver-gold) to ensure data quality throughout the entire data lifecycle.
  8. Leverage Custom Metrics: Incorporate custom metrics tailored to your specific use case to gain more profound, more relevant insights, performance, and data quality.

Here are some articles and videos about Lakehouse monitoring for your review:

  1. Navigating the Waters of Lakehouse Monitoring and Observability by eeezee (Databricks) - https://community.databricks.com/t5/technical-blog/navigating-the-waters-of-lakehouse-monitoring-and... 
  2. Ensuring Quality Forecasts with Databricks Lakehouse Monitoring by Peter Park - https://www.databricks.com/blog/ensuring-quality-forecasts-databricks-lakehouse-monitoring 
  3. Lakehouse Monitoring GA: Profiling, Diagnosing, and Enforcing Data Quality with Intelligence - 
    1. https://www.youtube.com/watch?v=aDBPoKyA0DQ 
    2. https://www.databricks.com/blog/lakehouse-monitoring-ga-profiling-diagnosing-and-enforcing-data-qual... 

I hope this helps!

Databricks Lakehouse Monitoring allows you to monitor all your data pipelines and ML models - without additional tools and complexity. Integrated into Unity Catalog, teams can track quality alongside governance, building towards the self-serve data platform dream. By continuously assessing the ...

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group