Datamart creation

billykimber — Thu, 24 Oct 2024 07:38:20 GMT

In a scenario where multiple teams access overlapping but not identical datasets from a shared data lake, is it better to create separate datamarts for each team (despite data redundancy) or to maintain a single datamart and use views for team-specific access? What are the trade-offs in terms of performance, maintenance, and scalability

Re: Datamart creation

-werners- — Thu, 24 Oct 2024 13:55:15 GMT

IMO there is no single best scenario.
It depends on the case I would say. Both have pros and cons.
If the difference between teams is really small, views could be a solution.
But on the other hand, if you work on massive data, the views first have to be calculated so this can take a while.
So you could use materialized views...
If there is a big difference between teams, coding that in a view might not be optimal.

Making separate datasets also makes sense as you can optimize each one. Also all logic resides in a single place (and not in view definitions).
But this might be overkill for your situation.

topic Datamart creation in Data Engineering

Datamart creation

Re: Datamart creation