I’m currently investigating solutions for Data Quality (DQ) within the Databricks environment and would love to hear what frameworks or approaches you are using for this purpose.
In the past, I’ve worked with Deequ, but I’ve noticed that it’s not as widely used anymore, and I’ve heard great expectations around other solutions. I’m curious to learn about your experiences:
- What frameworks or tools are you using for Data Quality in Databricks today?
- How do you approach DQ monitoring, validation, and automation in your pipelines?
- Are there any specific challenges or best practices you'd like to share?
Any insights or recommendations would be greatly appreciated. Looking forward to hearing your thoughts!