Hi Prakash,
Yes, Great Expectations integrates well with Databricks and is commonly used to enforce data quality checks in pipelines. For example, validating schema, nulls, ranges, or business rules.
You can use it in a few ways:
-
Directly in Python notebooks using %pip install great_expectations
-
As part of a job or task within a Databricks workflow
-
Embedded in custom ETL/ELT logic to validate input or output datasets
-
Optionally generate data docs for reporting and audit
That said, if you're using DLT (now part of Lakeflow), Databricks provides native expectations out of the box. You can define them declaratively like this:
@dlt.expect("non_null_id", "id IS NOT NULL")
@dlt.expect_or_drop("valid_age", "age BETWEEN 0 AND 120")
def clean_users():
return spark.read.table("raw.users")
These expectations automatically track data quality, can log violations, drop invalid records, or stop the pipeline entirely, and all results are stored in the DLT event log for visibility.
If you're already on DLT, native expectations are usually the best starting point.