Data preparation in Databricks
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-15-2023
07:52 PM
- last edited on
06-26-2023
10:26 AM
by
christys
Data preparation in Databricks
Good data is important to ensure accurate and useful results. To get good data following tasks must be done
- Cleaning and formatting data - Handling missing values or outliers, ensuring data is in the correct format, and removing unneeded columns.
- Preprocessing data- Numerical transformations, aggregating data, encoding text or image data, and creating new features.
- Combining data.- Joining tables or merging datasets.
Data preparation resources
- Medallion lakehouse architecture - https://docs.databricks.com/lakehouse/medallion.html
- Delta Live Tables - https://docs.databricks.com/delta-live-tables/index.html
- Databricks Partner Connect - https://docs.databricks.com/partner-connect/prep.html
- Release notes - https://docs.databricks.com/release-notes/runtime/releases.html