Hi Databricks Team, would like to implement data quality rules in Databricks, apart from DLT do we have any standard approach to perform/ apply data quality rules on bronze layer before further proceeding to silver and gold layer.
• Databricks recommends applying data quality rules on the bronze layer before proceeding to the silver and gold layer. • The recommended approach involves storing data quality rules in a Delta table. • The rules are categorized by a tag and are used in dataset definitions to determine which restrictions to apply. • A table named ’rules’ is created to maintain the data quality rules. • The rules are defined using SQL constraint clauses. • A function called ’get_rules()’ is created to read the rules from the ’rules’ table and return a Python dictionary containing rules matching the provided tag. • The dictionary of rules is then applied using the '@dlt.expect_all_*()' decorators to enforce data quality constraints. • The ’get_farmers_market_data()’ function is decorated with the '@dlt.expect_all_or_drop()’ decorator, which applies the data quality constraints defined in the ’get_rules()’ function to the ’raw_farmers_market’ table.
Welcome to Databricks Community: Lets learn, network and celebrate together
Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.