In my opinion, the most reliable approach is to separate flexibility and control across layers.
First, allow schema evolution only in the bronze layer. This layer should be treated as raw and flexible, where Auto Loader can adapt to upstream changes.
Second, enforce a strict schema from the silver layer onward. This prevents instability in merge operations and downstream transformations.
A pattern that works well:
- Bronze: ingest raw data with schema evolution enabled
- Intermediate step: normalize the schema by casting types and handling missing or new columns
- Silver: apply merge logic using a stable and controlled schema
For type changes, it is safer to handle them explicitly instead of relying on automatic evolution. Implicit changes can lead to failed merges or inconsistent data.
For reprocessing, having the full raw data in bronze is critical. When a breaking change happens, you can update your transformation logic and replay the data without depending on the source system again.
In production, I also recommend adding monitoring to detect schema changes early instead of trying to fully automate recovery.
In summary:
- keep bronze flexible
- enforce contracts in silver
- handle breaking changes explicitly
- design for reprocessing