cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Prakash Hinduja Geneva (Swiss) Can I use tools like Great Expectations with Databricks?

prakashhinduja1
New Contributor

Hi everyone,

I am Prakash Hinduja from Geneva, Switzerland (Swiss) currently exploring ways to improve data quality checks in my Databricks pipelines and came across Great Expectations. Iโ€™d love to know if anyone here has experience using it with Databricks.

 

Regards

Prakash Hinduja from Geneva, Switzerland (Swiss) 

1 ACCEPTED SOLUTION

Accepted Solutions

Nir_Hedvat
Databricks Employee
Databricks Employee

Hi Prakash,
Yes, Great Expectations integrates well with Databricks and is commonly used to enforce data quality checks in pipelines. For example, validating schema, nulls, ranges, or business rules.

You can use it in a few ways:

  • Directly in Python notebooks using %pip install great_expectations

  • As part of a job or task within a Databricks workflow

  • Embedded in custom ETL/ELT logic to validate input or output datasets

  • Optionally generate data docs for reporting and audit

That said, if you're using DLT (now part of Lakeflow), Databricks provides native expectations out of the box. You can define them declaratively like this:

@dlt.expect("non_null_id", "id IS NOT NULL")
@dlt.expect_or_drop("valid_age", "age BETWEEN 0 AND 120")
def clean_users():
    return spark.read.table("raw.users")

These expectations automatically track data quality, can log violations, drop invalid records, or stop the pipeline entirely, and all results are stored in the DLT event log for visibility.

If you're already on DLT, native expectations are usually the best starting point.

 

View solution in original post

1 REPLY 1

Nir_Hedvat
Databricks Employee
Databricks Employee

Hi Prakash,
Yes, Great Expectations integrates well with Databricks and is commonly used to enforce data quality checks in pipelines. For example, validating schema, nulls, ranges, or business rules.

You can use it in a few ways:

  • Directly in Python notebooks using %pip install great_expectations

  • As part of a job or task within a Databricks workflow

  • Embedded in custom ETL/ELT logic to validate input or output datasets

  • Optionally generate data docs for reporting and audit

That said, if you're using DLT (now part of Lakeflow), Databricks provides native expectations out of the box. You can define them declaratively like this:

@dlt.expect("non_null_id", "id IS NOT NULL")
@dlt.expect_or_drop("valid_age", "age BETWEEN 0 AND 120")
def clean_users():
    return spark.read.table("raw.users")

These expectations automatically track data quality, can log violations, drop invalid records, or stop the pipeline entirely, and all results are stored in the DLT event log for visibility.

If you're already on DLT, native expectations are usually the best starting point.