cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Data preparation in Databricks

Priyag1
Honored Contributor II

Data preparation in Databricks

Good data is important to ensure accurate and useful results. To get good data following tasks must be done

  • Cleaning and formatting data - Handling missing values or outliers, ensuring data is in the correct format, and removing unneeded columns.
  • Preprocessing data- Numerical transformations, aggregating data, encoding text or image data, and creating new features.
  • Combining data.- Joining tables or merging datasets.

Data preparation resources

  1. Medallion lakehouse architecturehttps://docs.databricks.com/lakehouse/medallion.html
  2. Delta Live Tables - https://docs.databricks.com/delta-live-tables/index.html
  3. Databricks Partner Connect - https://docs.databricks.com/partner-connect/prep.html
  4. Release notes - https://docs.databricks.com/release-notes/runtime/releases.html

 

 

 

4 REPLIES 4

Anonymous
Not applicable

Hi @Priyadarshini G​ 

Great to meet you, and thanks for your question!

Let's see if your peers in the community have an answer to your question. Thanks.

bharats
New Contributor III

Useful Information. Hope u do more summarized posts on these concepts

Sandro
New Contributor II

Great introduction, for some cases, I would add some other dimensions of data quality, such as completeness of data and referential integrity validation.

dplante
Contributor II

Data governance and data lineage are other things to call out.

Here's a cheat sheet  that is also useful -> Data Preparation Cheatsheet

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group