Databricks Community

prakashhinduja1 · ‎07-24-2025

Hi everyone,

I am Prakash Hinduja from Geneva, Switzerland (Swiss) currently exploring ways to improve data quality checks in my Databricks pipelines and came across Great Expectations. I’d love to know if anyone here has experience using it with Databricks.

Regards

Prakash Hinduja from Geneva, Switzerland (Swiss)

Nir_Hedvat · ‎07-24-2025

Hi Prakash,
Yes, Great Expectations integrates well with Databricks and is commonly used to enforce data quality checks in pipelines. For example, validating schema, nulls, ranges, or business rules.

You can use it in a few ways:

Directly in Python notebooks using %pip install great_expectations
As part of a job or task within a Databricks workflow
Embedded in custom ETL/ELT logic to validate input or output datasets
Optionally generate data docs for reporting and audit

That said, if you're using DLT (now part of Lakeflow), Databricks provides native expectations out of the box. You can define them declaratively like this:

@dlt.expect("non_null_id", "id IS NOT NULL")
@dlt.expect_or_drop("valid_age", "age BETWEEN 0 AND 120")
def clean_users():
    return spark.read.table("raw.users")

These expectations automatically track data quality, can log violations, drop invalid records, or stop the pipeline entirely, and all results are stored in the DLT event log for visibility.

If you're already on DLT, native expectations are usually the best starting point.

View solution in original post

Nir_Hedvat · ‎07-24-2025

Hi Prakash,
Yes, Great Expectations integrates well with Databricks and is commonly used to enforce data quality checks in pipelines. For example, validating schema, nulls, ranges, or business rules.

You can use it in a few ways:

Directly in Python notebooks using %pip install great_expectations
As part of a job or task within a Databricks workflow
Embedded in custom ETL/ELT logic to validate input or output datasets
Optionally generate data docs for reporting and audit

That said, if you're using DLT (now part of Lakeflow), Databricks provides native expectations out of the box. You can define them declaratively like this:

@dlt.expect("non_null_id", "id IS NOT NULL")
@dlt.expect_or_drop("valid_age", "age BETWEEN 0 AND 120")
def clean_users():
    return spark.read.table("raw.users")

These expectations automatically track data quality, can log violations, drop invalid records, or stop the pipeline entirely, and all results are stored in the DLT event log for visibility.

If you're already on DLT, native expectations are usually the best starting point.

Databricks Community

Prakash Hinduja Geneva (Swiss) Can I use tools like Great Expectations with Databricks?

Join Us as a Local Community Builder!

🌟 Community Pulse: Your Weekly Roundup! October 31 – November 06, 2025

Free Edition Hackathon

🚀 Announcing the Databricks Data Intelligence Platform Cheat Sheet

Zerobus Ingest in Action: How to Stream Event Data Directly into Your Lakehouse

Find Sensitive Data at Scale with Data Classification in Unity Catalog