cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

What's the best way to manage multiple versions of the same datasets?

Kyle
New Contributor II

We have use cases that require multiple versions of the same datasets to be available. For example, we have a knowledge graph made of entities of relations, and we have multiple versions of the knowledge graph that's distinguished by schema names right now:

knowledge_graph__1_3 (version 1.3)
    - entities
    - relations
knowledge_graph__1_4 (version 1.4)
    - entities
    - relations
knowledge_graph__2_1 (version 2.1)
    - entities
    - relations

While we can shoehorn the use cases by using names to version datasets, it doesn't feel like an elegant way. I'm also aware of the Delta Lake versioning capabilities, but we would like to keep each version a top-level artifact rather than historical versions as in the Delta Lake case. Any recommendations of best practices?

1 ACCEPTED SOLUTION

Accepted Solutions

5 REPLIES 5

Anonymous
Not applicable

Howdy, @Kyle Gao​. My name is Piper, and I'm a moderator for Databricks. Welcome to the community! Let's give the community some time to respond, and then we'll find an SME if we need to.

Thanks in advance for your patience. 🙂

-werners-
Esteemed Contributor III

is it an option to add a version number? you did not mention the format in which the data is stored in the end.

Kyle
New Contributor II

> is it an option to add a version number?

Where do you suggest the version number to be added? We append the version number to a database name right now, but it doesn't feel very elegant.

The data is stored in delta format, I'm not sure how it's relevant though.

Anonymous
Not applicable

Hey there @Kyle Gao​ 

Hope you are doing well. Thank you for posting your query.

Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.

Cheers!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group