Hi @lawrence009, Renaming and dropping columns are not Databricks proprietary methods, but Databricks Delta Lake provides an enhanced implementation of these operations using column mapping. This feature allows metadata-only changes to mark columns as deleted or renamed without rewriting the underlying data files.
Under the hood, when column mapping is enabled for a Delta table, the operations of renaming and dropping columns are performed as follows:
- To rename a column, the SQL command ALTER TABLE RENAME COLUMN old_col_name TO new_col_name
is used.
- To drop one or more columns, the SQL command ALTER TABLE table_name DROP COLUMN col_name
or ALTER TABLE table_name DROP COLUMNS (col_name_1, col_name_2, ..)
is used.
Enabling column mapping also allows random file prefixes, which removes the ability to explore data using Hive-style partitioning. Furthermore, enabling column mapping on tables might break downstream operations relying on Delta change data feed and break streaming read from the Delta table as a source.
Enabling column mapping for a table upgrades the Delta table version, and this protocol upgrade is irreversible. Column mapping requires the following Delta protocols: Reader version 2 or above and Writer version 5 or above.