<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Datamart creation in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/datamart-creation/m-p/95867#M39185</link>
    <description>&lt;P&gt;In a scenario where multiple teams access overlapping but not identical datasets from a shared data lake, is it better to create separate datamarts for each team (despite data redundancy) or to maintain a single datamart and use views for team-specific access? What are the trade-offs in terms of performance, maintenance, and scalability&lt;/P&gt;</description>
    <pubDate>Thu, 24 Oct 2024 07:38:20 GMT</pubDate>
    <dc:creator>billykimber</dc:creator>
    <dc:date>2024-10-24T07:38:20Z</dc:date>
    <item>
      <title>Datamart creation</title>
      <link>https://community.databricks.com/t5/data-engineering/datamart-creation/m-p/95867#M39185</link>
      <description>&lt;P&gt;In a scenario where multiple teams access overlapping but not identical datasets from a shared data lake, is it better to create separate datamarts for each team (despite data redundancy) or to maintain a single datamart and use views for team-specific access? What are the trade-offs in terms of performance, maintenance, and scalability&lt;/P&gt;</description>
      <pubDate>Thu, 24 Oct 2024 07:38:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/datamart-creation/m-p/95867#M39185</guid>
      <dc:creator>billykimber</dc:creator>
      <dc:date>2024-10-24T07:38:20Z</dc:date>
    </item>
    <item>
      <title>Re: Datamart creation</title>
      <link>https://community.databricks.com/t5/data-engineering/datamart-creation/m-p/95985#M39196</link>
      <description>&lt;P&gt;IMO there is no single best scenario.&lt;BR /&gt;It depends on the case I would say.&amp;nbsp; Both have pros and cons.&lt;BR /&gt;If the difference between teams is really small, views could be a solution.&lt;BR /&gt;But on the other hand, if you work on massive data, the views first have to be calculated so this can take a while.&lt;BR /&gt;So you could use materialized views...&lt;BR /&gt;If there is a big difference between teams, coding that in a view might not be optimal.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Making separate datasets also makes sense as you can optimize each one.&amp;nbsp; Also all logic resides in a single place (and not in view definitions).&lt;BR /&gt;But this might be overkill for your situation.&lt;/P&gt;</description>
      <pubDate>Thu, 24 Oct 2024 13:55:15 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/datamart-creation/m-p/95985#M39196</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2024-10-24T13:55:15Z</dc:date>
    </item>
  </channel>
</rss>

