cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Phani1
by Valued Contributor II
  • 8081 Views
  • 5 replies
  • 0 kudos

Data Quality in Databricks

Hi Databricks Team, would like to implement data quality rules in Databricks, apart from DLT do we have any standard approach to perform/ apply data quality rules on bronze layer before further proceeding to silver and gold layer.

  • 8081 Views
  • 5 replies
  • 0 kudos
Latest Reply
joarobles
New Contributor III
  • 0 kudos

Looks nice! However I don't see Databricks support in the docs

  • 0 kudos
4 More Replies
Kash
by Contributor III
  • 1790 Views
  • 1 replies
  • 0 kudos

Data-quality help: Save Data Profile dbutils.data.summarize(df) to table

Hi there,We would like to create a data quality database that helps us understand how complete our data is. We would like to run a job each day that basically outputs the same table data as dbutils.data.summarize(df) for a given table and save it to ...

  • 1790 Views
  • 1 replies
  • 0 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 0 kudos

From what I know there's no easy way to save dbutils.data.summarize() into a df.You can still create your custom python/pyspark code to profile your data and save the output.

  • 0 kudos
hf_santos
by New Contributor III
  • 7068 Views
  • 4 replies
  • 2 kudos

Resolved! Error when importing PyDeequ package

Hi everyone,I want to do some tests regarding data quality and for that I pretend to use PyDeequ on a databricks notebook. Keep in mind that I'm very new to databricks and Spark.First I created a cluster with the Runtime version "10.4 LTS (includes A...

  • 7068 Views
  • 4 replies
  • 2 kudos
Latest Reply
hf_santos
New Contributor III
  • 2 kudos

I assumed I wouldn't need to add the Deequ library. Apparently, all I had to do was add it via Maven coordinates and it solved the problem.

  • 2 kudos
3 More Replies
Labels