@DataBricks_User9 c :
It seems like you have a well-defined ETL framework with different layers for processing data. Regarding your question about creating separate tables in the Trusted Layer for derived columns, it can be a good idea as it allows you to centralize the logic for calculating derived columns in one place, which can make it easier to maintain and update the logic if needed. It also allows you to avoid repeating the same logic in multiple queries, which can improve query performance and reduce the risk of errors.
As for making the DLT table with incremental refresh, you can use the "MERGE INTO" statement in Delta Lake to perform an "upsert" operation based on the Modified_Date column. You can use the Modified_Date column in the Child table to determine which rows need to be updated or inserted in the target table. Here's an example SQL statement that you can use as a starting point:
MERGE INTO target_table t
USING (
SELECT c.Id, concat_ws(',', collect_list(c.Product_Name)) AS Product_Names
FROM child_table c
JOIN parent_table p ON c.Id = p.Id
WHERE c.Modified_Date > t.max_modified_date -- only process new or updated rows
GROUP BY c.Id
) s
ON t.Id = s.Id
WHEN MATCHED THEN
UPDATE SET t.Product_Name = s.Product_Names
WHEN NOT MATCHED THEN
INSERT (Id, Product_Name) VALUES (s.Id, s.Product_Names)
In this example, we first join the child and parent tables based on the Id column, then group by the Id and use the "collect_list" function to concatenate the Product_Name values into a comma-separated string. We then use the "MERGE INTO" statement to update or insert the rows into the target table based on the Id column. We only process rows that have a Modified_Date in the child table that is greater than the maximum Modified_Date in the target table, which ensures that we only process new or updated rows.
I hope this helps! Let me know if you have any other questions.