Data preparation in Databricks

Priyag1
Honored Contributor II

Data preparation in Databricks

Good data is important to ensure accurate and useful results. To get good data following tasks must be done

  • Cleaning and formatting data - Handling missing values or outliers, ensuring data is in the correct format, and removing unneeded columns.
  • Preprocessing data- Numerical transformations, aggregating data, encoding text or image data, and creating new features.
  • Combining data.- Joining tables or merging datasets.

Data preparation resources

  1. Medallion lakehouse architecturehttps://docs.databricks.com/lakehouse/medallion.html
  2. Delta Live Tables - https://docs.databricks.com/delta-live-tables/index.html
  3. Databricks Partner Connect - https://docs.databricks.com/partner-connect/prep.html
  4. Release notes - https://docs.databricks.com/release-notes/runtime/releases.html