<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: What's the best way to manage multiple versions of the same datasets? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/what-s-the-best-way-to-manage-multiple-versions-of-the-same/m-p/27991#M19829</link>
    <description>&lt;P&gt;Howdy, @Kyle Gao​. My name is Piper, and I'm a moderator for Databricks. Welcome to the community! Let's give the community some time to respond, and then we'll find an SME if we need to.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks in advance for your patience. &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 16 Feb 2022 16:23:00 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2022-02-16T16:23:00Z</dc:date>
    <item>
      <title>What's the best way to manage multiple versions of the same datasets?</title>
      <link>https://community.databricks.com/t5/data-engineering/what-s-the-best-way-to-manage-multiple-versions-of-the-same/m-p/27990#M19828</link>
      <description>&lt;P&gt;We have use cases that require multiple versions of the same datasets to be available.  For example, we have a knowledge graph made of entities of relations, and we have multiple versions of the knowledge graph that's distinguished by schema names right now:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;knowledge_graph__1_3 (version 1.3)
    - entities
    - relations
knowledge_graph__1_4 (version 1.4)
    - entities
    - relations
knowledge_graph__2_1 (version 2.1)
    - entities
    - relations&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;While we can shoehorn the use cases by using names to version datasets, it doesn't feel like an elegant way.  I'm also aware of the Delta Lake versioning capabilities, but we would like to keep each version a top-level artifact rather than historical versions as in the Delta Lake case.  Any recommendations of best practices?&lt;/P&gt;</description>
      <pubDate>Tue, 15 Feb 2022 21:57:15 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-s-the-best-way-to-manage-multiple-versions-of-the-same/m-p/27990#M19828</guid>
      <dc:creator>Kyle</dc:creator>
      <dc:date>2022-02-15T21:57:15Z</dc:date>
    </item>
    <item>
      <title>Re: What's the best way to manage multiple versions of the same datasets?</title>
      <link>https://community.databricks.com/t5/data-engineering/what-s-the-best-way-to-manage-multiple-versions-of-the-same/m-p/27991#M19829</link>
      <description>&lt;P&gt;Howdy, @Kyle Gao​. My name is Piper, and I'm a moderator for Databricks. Welcome to the community! Let's give the community some time to respond, and then we'll find an SME if we need to.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks in advance for your patience. &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 16 Feb 2022 16:23:00 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-s-the-best-way-to-manage-multiple-versions-of-the-same/m-p/27991#M19829</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2022-02-16T16:23:00Z</dc:date>
    </item>
    <item>
      <title>Re: What's the best way to manage multiple versions of the same datasets?</title>
      <link>https://community.databricks.com/t5/data-engineering/what-s-the-best-way-to-manage-multiple-versions-of-the-same/m-p/27992#M19830</link>
      <description>&lt;P&gt;is it an option to add a version number? you did not mention the format in which the data is stored in the end.&lt;/P&gt;</description>
      <pubDate>Thu, 17 Feb 2022 07:18:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-s-the-best-way-to-manage-multiple-versions-of-the-same/m-p/27992#M19830</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2022-02-17T07:18:06Z</dc:date>
    </item>
    <item>
      <title>Re: What's the best way to manage multiple versions of the same datasets?</title>
      <link>https://community.databricks.com/t5/data-engineering/what-s-the-best-way-to-manage-multiple-versions-of-the-same/m-p/27993#M19831</link>
      <description>&lt;P&gt;&amp;gt; is it an option to add a version number?&lt;/P&gt;&lt;P&gt;Where do you suggest the version number to be added?  We append the version number to a database name right now, but it doesn't feel very elegant.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The data is stored in delta format, I'm not sure how it's relevant though.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 17 Feb 2022 18:49:05 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-s-the-best-way-to-manage-multiple-versions-of-the-same/m-p/27993#M19831</guid>
      <dc:creator>Kyle</dc:creator>
      <dc:date>2022-02-17T18:49:05Z</dc:date>
    </item>
    <item>
      <title>Re: What's the best way to manage multiple versions of the same datasets?</title>
      <link>https://community.databricks.com/t5/data-engineering/what-s-the-best-way-to-manage-multiple-versions-of-the-same/m-p/27994#M19832</link>
      <description>&lt;P&gt;Have you explore Delta's CDF? &lt;A href="https://docs.databricks.com/delta/delta-change-data-feed.html" target="test_blank"&gt;https://docs.databricks.com/delta/delta-change-data-feed.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 02 Mar 2022 02:05:10 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-s-the-best-way-to-manage-multiple-versions-of-the-same/m-p/27994#M19832</guid>
      <dc:creator>jose_gonzalez</dc:creator>
      <dc:date>2022-03-02T02:05:10Z</dc:date>
    </item>
    <item>
      <title>Re: What's the best way to manage multiple versions of the same datasets?</title>
      <link>https://community.databricks.com/t5/data-engineering/what-s-the-best-way-to-manage-multiple-versions-of-the-same/m-p/27995#M19833</link>
      <description>&lt;P&gt;Hey there @Kyle Gao​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Hope you are doing well. Thank you for posting your query.&lt;/P&gt;&lt;P&gt;Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Cheers!&lt;/P&gt;</description>
      <pubDate>Wed, 27 Apr 2022 16:00:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-s-the-best-way-to-manage-multiple-versions-of-the-same/m-p/27995#M19833</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2022-04-27T16:00:03Z</dc:date>
    </item>
  </channel>
</rss>

