Re: Schema Evolution and Schema Enforcement withou...

Lu_Wang_ENB_DBX · 3 hours ago

Here are the answers to your questions:

Is Delta’s restriction a design decision?
Yes. In Delta, mergeSchema is mainly for schema evolution by adding columns; type changes are still controlled by schema enforcement unless the change qualifies for type widening. If the mismatch does not meet type-widening conditions, Delta follows normal enforcement rules instead of silently changing the column type.
Can append mode handle type changes without pre-casting or full overwrite?
Yes, but only for supported widening changes such as INT -> BIGINT, and only when the target table has delta.enableTypeWidening = true and schema evolution is enabled on the write.
For INT -> STRING, that is not a supported automatic widening path; docs explicitly call it an unsupported data type change in Auto Loader, where it gets rescued instead of widened.
What Git repo code result shows the intended pattern?
A GitHub code file in databrickslabs/lakebridge uses the exact Delta pattern of enabling type widening first, then altering column types:

sqls: list | None = [
  f"ALTER TABLE {table_identifier} SET TBLPROPERTIES ('delta.enableTypeWidening' = 'true')",
  f"ALTER TABLE {table_identifier} ALTER COLUMN recon_metrics.row_comparison.missing_in_source TYPE BIGINT",
  f"ALTER TABLE {table_identifier} ALTER COLUMN recon_metrics.row_comparison.missing_in_target TYPE BIGINT",
]

What is the recommended production approach for dynamic schema drift?
Use a Bronze/Silver pattern with Auto Loader. By default, Auto Loader is designed to avoid breaking on type mismatches: for text formats it infers columns as STRING, and with rescue modes it places unsupported type-change values into the rescued data column instead of failing the pipeline.
If you want automatic widening for compatible changes, use addNewColumnsWithTypeWidening plus delta.enableTypeWidening=true; unsupported changes like INT -> STRING should be rescued/quarantined and normalized downstream rather than forced into the Delta target schema during append.
Summary

New columns → use mergeSchema.
Widenable type changes → enable type widening and keep append mode.
Non-widening changes like INT -> STRING → do not rely on Delta mergeSchema; land raw data, rescue the bad values, and reconcile/cast in a downstream layer, or explicitly alter/overwrite the table schema when you choose to accept the change.