Data preparation in Databricks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-15-2023
07:52 PM
- last edited on
06-26-2023
10:26 AM
by
christys
Data preparation in Databricks
Good data is important to ensure accurate and useful results. To get good data following tasks must be done
- Cleaning and formatting data - Handling missing values or outliers, ensuring data is in the correct format, and removing unneeded columns.
- Preprocessing data- Numerical transformations, aggregating data, encoding text or image data, and creating new features.
- Combining data.- Joining tables or merging datasets.
Data preparation resources
- Medallion lakehouse architecture - https://docs.databricks.com/lakehouse/medallion.html
- Delta Live Tables - https://docs.databricks.com/delta-live-tables/index.html
- Databricks Partner Connect - https://docs.databricks.com/partner-connect/prep.html
- Release notes - https://docs.databricks.com/release-notes/runtime/releases.html
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-17-2023 02:40 AM
Hi @Priyadarshini G
Great to meet you, and thanks for your question!
Let's see if your peers in the community have an answer to your question. Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-22-2023 10:35 PM
Useful Information. Hope u do more summarized posts on these concepts
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-28-2023 02:59 PM
Great introduction, for some cases, I would add some other dimensions of data quality, such as completeness of data and referential integrity validation.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-15-2023 05:16 PM
Data governance and data lineage are other things to call out.
Here's a cheat sheet that is also useful -> Data Preparation Cheatsheet

