cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

What the best Framework/Package for data quality

William_Scardua
Valued Contributor

Hi everyone,

Iโ€™m currently looking for a data-quality solution for my environment. I donโ€™t have DTL tables or a Unity Catalog in place.

In your opinion, what is the best framework or package to implement reliable data-quality checks under these conditions?

Thanks in advance!

1 REPLY 1

nayan_wylde
Esteemed Contributor

Here are few DQ packages for DLT or LDP that you can try.

1. Databricks Labs DQX

  • Purpose-built for Spark and Databricks.
  • Rule-based checks on DataFrames (batch & streaming).
  • Supports quarantine and profiling.
  • Lightweight and easy to integrate.

2. Great Expectations

  • Popular Python library for data validation.
  • Works with Spark, Pandas, SQL.
  • Rich set of expectations and auto-generated documentation.
  • Best for governance and transparency.

3. Cuallee

  • Lightweight, fast, and DataFrame-agnostic.
  • Supports PySpark, Pandas, Polars, DuckDB.
  • 50+ built-in checks, minimal setup.

4. Spark Expectations

  • Designed for Spark environments.
  • Uses decorators for defining rules.
  • Provides error tables and stats for monitoring.

5. Pandas-DQ

  • For quick profiling and cleaning in Pandas.
  • HTML reports, duplicate/missing value checks.
  • Ideal for small datasets or pre-ingestion checks.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now