What are your thoughts on this Databricks pipeline design?
Different facilities will send me backups of a proprietary transactional database containing tens of thousands of tables. Each facility may have differences in how these tables are populated or differences in schema.
Pre-Bronze: Loop through the parquet files and create a registry of tables and schemas
Bronze: one large ingest DLT with schema inference and schema evolution, parameterized by ADLS2 location. This could end up being huge, if there's tens of thousands of tables but it will ingest all the columns, regardless of schema. If table_a has col_a, col_b and table_b has col_a, col_c, and tbl_c has col_b, col_c, the resulting ingest would have the structure of table_name, source_file, facility, col_a, col_b, col_c, ingest_timestamp.
Silver: separate DLTs per facility + table, so each can evolve its own cleaning logic. Ideally I would extract rows from the ingest and use the maximal schema from the schema registry / cleaning logic / expectations for individual silver tables. (i.e. SILVER_NYC_SECURITIES_TRANSACTIONS, SILVER_LON_SECURITIES_TRANSACTIONS)
Gold: unified layer that combines all facilities for enterprise‑wide analytics (i.e GOLD_SECURITIES_TRANSACTIONS).
Do you see this as a scalable, governed approach, or would you recommend a different pattern for balancing modularity, lineage clarity, and long‑term maintainability?