cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 

A single DLT for Ingest - feedback on this architecture

RDE305
New Contributor II

What are your thoughts on this Databricks pipeline design?

Different facilities will send me backups of a proprietary transactional database containing tens of thousands of tables. Each facility may have differences in how these tables are populated or differences in schema. 

  • Pre-Bronze:  Loop through the parquet files and create a registry of tables and schemas

  • Bronze: one large ingest DLT with schema inference and schema evolution, parameterized by ADLS2 location. This could end up being huge, if there's tens of thousands of tables but it will ingest all the columns, regardless of schema. If table_a has col_a, col_b and table_b has col_a, col_c, and tbl_c has col_b, col_c, the resulting ingest would have the structure of table_name, source_file, facility, col_a, col_b, col_c, ingest_timestamp. 

  • Silver: separate DLTs per facility + table, so each can evolve its own cleaning logic. Ideally I would extract rows from the ingest and use the maximal schema from the schema registry / cleaning logic / expectations for  individual silver tables. (i.e. SILVER_NYC_SECURITIES_TRANSACTIONS, SILVER_LON_SECURITIES_TRANSACTIONS)

  • Gold: unified layer that combines all facilities for enterprise‑wide analytics (i.e GOLD_SECURITIES_TRANSACTIONS).

Do you see this as a scalable, governed approach, or would you recommend a different pattern for balancing modularity, lineage clarity, and long‑term maintainability?

 
0 REPLIES 0