cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

What's the best way to manage multiple versions of the same datasets?

Kyle
New Contributor II

We have use cases that require multiple versions of the same datasets to be available. For example, we have a knowledge graph made of entities of relations, and we have multiple versions of the knowledge graph that's distinguished by schema names right now:

knowledge_graph__1_3 (version 1.3)
    - entities
    - relations
knowledge_graph__1_4 (version 1.4)
    - entities
    - relations
knowledge_graph__2_1 (version 2.1)
    - entities
    - relations

While we can shoehorn the use cases by using names to version datasets, it doesn't feel like an elegant way. I'm also aware of the Delta Lake versioning capabilities, but we would like to keep each version a top-level artifact rather than historical versions as in the Delta Lake case. Any recommendations of best practices?

1 ACCEPTED SOLUTION
5 REPLIES 5

Anonymous
Not applicable

Howdy, @Kyle Gao​. My name is Piper, and I'm a moderator for Databricks. Welcome to the community! Let's give the community some time to respond, and then we'll find an SME if we need to.

Thanks in advance for your patience. 🙂

-werners-
Esteemed Contributor III

is it an option to add a version number? you did not mention the format in which the data is stored in the end.

Kyle
New Contributor II

> is it an option to add a version number?

Where do you suggest the version number to be added? We append the version number to a database name right now, but it doesn't feel very elegant.

The data is stored in delta format, I'm not sure how it's relevant though.

Anonymous
Not applicable

Hey there @Kyle Gao​ 

Hope you are doing well. Thank you for posting your query.

Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.

Cheers!

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.