Hi Everyone,
One use case Iโm proud of was implementing data quality checks in Databricks pipelines.
I designed a solution that runs validations with PySpark and Great Expectations, and then persists the results of the DQ analysis into a Delta table. Later, I used this data to build a dashboard with a global view of data quality across the environment, which gave the teams much better visibility and confidence in the pipelines.
I even shared a detailed post about this here, in case itโs helpful for others:
๐Data Quality with PySpark and Great Expectations on Databricks: https://community.databricks.com/t5/knowledge-sharing-hub/data-quality-with-pyspark-and-great-expect...
It was a great success case for me, and also very rewarding to see how small improvements can bring big value. ๐
Wiliam Rosa
Data Engineer | Machine Learning Engineer
LinkedIn: linkedin.com/in/wiliamrosa