cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Data Quality in Databricks

Phani1
Valued Contributor II

Hi Databricks Team, would like to implement data quality rules in Databricks, apart from DLT do we have any standard approach to perform/ apply data quality rules on bronze layer before further proceeding to silver and gold layer.

6 REPLIES 6

AndrewN
New Contributor III
New Contributor III

Check out dbdemos.ai, you may be interested in the example of applying tests to your DLT pipeline to ensure data quality.

https://www.dbdemos.ai/demo.html?demoName=dlt-unit-test

Phani1
Valued Contributor II

import_errorThanks for sharing the details, we are using DBR 12.2.

We are facing the below error while importing the libraries/package. can you please help me how to import it

Kaniz_Fatma
Community Manager
Community Manager

Hi @Phani1 , 

โ€ข Databricks recommends applying data quality rules on the bronze layer before proceeding to the silver and gold layer.
โ€ข The recommended approach involves storing data quality rules in a Delta table.
โ€ข The rules are categorized by a tag and are used in dataset definitions to determine which restrictions to apply.
โ€ข A table named โ€™rulesโ€™ is created to maintain the data quality rules.
โ€ข The rules are defined using SQL constraint clauses.
โ€ข A function called โ€™get_rules()โ€™ is created to read the rules from the โ€™rulesโ€™ table and return a Python dictionary containing rules matching the provided tag.
โ€ข The dictionary of rules is then applied using the '@dlt.expect_all_*()' decorators to enforce data quality constraints.
โ€ข The โ€™get_farmers_market_data()โ€™ function is decorated with the '@dlt.expect_all_or_drop()โ€™ decorator, which applies the data quality constraints defined in the โ€™get_rules()โ€™ function to the โ€™raw_farmers_marketโ€™ table.

joarobles
New Contributor III

You could also apply data quality checks using open source libraries such as Great Expectations or pydq on the ETL level.

Another approach is to use no code platforms like Rudol to allow non-technical roles such as Data Stewards to implement data quality validations by themselves.

aalanis
New Contributor II

Hi there,

you should check this python library for data quality checks: 

https://canimus.github.io/cuallee/

It is very fast and feature rich when it comes to the checks.

joarobles
New Contributor III

Looks nice! However I don't see Databricks support in the docs ๐Ÿ˜•

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group