cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

What is the Data Quality Framework do you use/recomend ?

William_Scardua
Valued Contributor

Hi guys,

In your opinion what is the best Data Quality Framework (or techinique) do you recommend ?

 

3 REPLIES 3

joarobles
New Contributor III

Hi there!

You could also take a look at Rudol, it has native Databricks support and covers Data Quality validations and Data Governance enabling non-technical roles such as Business Analysts or Data Stewards to be part of data quality as well with no-code validations and integrations with everyday tools like Slack or Microsoft Teams.

Have a high-quality day!  

dataoculus_app
New Contributor II

There are many DQ tools and platforms, but most are SQL based, and thus it costs and its delayed.  so it really depends on your use-case and problem statement. sometimes it makes sense to build your own, but most of the time it does not make sense if it should be used as central service.

chanukya-pekala
Contributor

DQ is interesting. There are a lot of options in this space. SODA, Great Expectations are kinda well integrate with Databricks setup.

I personally try to use dataframe abstractions for validating. We used deequ tool which is very simple to use, just pass your spark dataframe to the code, and validations happen inside your spark session (if it needs to be), otherwise we can decouple the DQ to separate classes in the package. I have spent some time working with it and created this blog post - https://datatribe.substack.com/p/deequ-an-open-source-data-quality 
Its a DQ tool for data engineers I would say. And, interestingly, we can make this deequ dataframes as output delta tables to see the quality patterns. Maintainer is AWSLABS. https://github.com/awslabs/deequ 

In addition, I would like to use spark-expectations opensourced by Nike - https://github.com/Nike-Inc/spark-expectations 

Chanukya

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now