We're encountering a specific issue in our DLT pipeline and would appreciate some advice. Here's an example to illustrate the challenge we're facing:
Tables Overview
- Material Master: Contains comprehensive material data updated daily with new records. We aim to upsert data into the silver layer using SCD Type 1.
- Material Description: Serves as a lookup table with descriptions for each material number, and gets updated once or twice a month. Similarly, we plan to upsert this data in the silver layer using SCD Type 1.
Pipeline Description
Our objective is to construct these tables in the silver layer through 1:1 mapping and upsert operations from our raw tables in the bronze layer. Subsequently, we join these tables to populate our core layer via another upsert operation. Our bronze table is a normal delta table.
Problem Statement
Currently, both tables are implemented as streaming tables within the silver layer of our DLT pipeline. When attempting a left join between these streaming tables, we encounter errors. This happens because streaming joins require specifying event timestamp intervals and watermarks. This is not applicable on the lookup table.
One alternative approach is to perform a Stream-to-Static join, which necessitates converting the lookup table (material_description) into a static table. However, we find no option in DLT to create non-streaming tables; even omitting the term "Streaming" results in the creation of a materialized view.
In the silver layer, we prefer all objects to be tables only, without resorting to solutions outside DLT, such as creating lookup tables as ordinary delta tables and executing merge operations in a separate notebook.
Could you provide insights or suggestions on how to overcome these limitations while remaining within the DLT framework?