cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
cancel
Showing results for 
Search instead for 
Did you mean: 

Differences between Feature Store and Unity Catalog

Northp
New Contributor II

Our small team has just finished the data preparation phase of our project and started data analysis in Databricks. As we go deeper into this field, we're trying to understand the distinctions and appropriate uses for a Feature Store versus a Unity Catalog. On the surface, it seems to me that both are ways to manage and use tables of data.

For example, let's consider a churn prediction. We need to create a table (or several) for churn prediction and join later, which would include multiple features that could potentially influence a user's possibility to churn. It appears to me that we could just create a new table in a Unity Catalog with this information. However, I am not sure if this is the right approach or if a Feature Store could be more suitable.

Here are my questions

  1. In what situations would we prefer to use a Feature Store over a Unity Catalog, or vice versa?

  2. If we were to use a Feature Store in this scenario, what would that look like in terms of setup and workflow?

  3. Is it possible to use the Feature Store and write directly to the Unity Catalog? I've tried, but it's giving me an error, and I could only write in the Hive Metastore. This is a bit confusing for the dashboard and other uses.

Really appreciate every help, ideally, any examples would be great. I'm very noob using Databricks, and a real-world approach would be very very helpful.

1 ACCEPTED SOLUTION

Accepted Solutions

Vinay_M_R
Valued Contributor II
Valued Contributor II

Hi @Northp  Good day!

1.)  A Feature Store is a centralized repository that enables data scientists to find and share features, ensuring that the same code used to compute the feature values is used for model training and inference. It is particularly useful in machine learning workflows, where feature engineering is a crucial step. Databricks Feature Store offers several benefits, such as discoverability, lineage, integration with model scoring and serving, and point-in-time lookups.

On the other hand, Unity Catalog is a metastore service that provides a unified, secure, and fully managed metastore across all Databricks workspaces in an account. It supports various data formats, SQL functions, and structured streaming workloads. It also allows for the management of metastore lifecycle and resources from the account console. However, it has limitations such as not supporting Scala, R, and workloads using the Machine Learning Runtime on clusters using the shared access mode, and not supporting bucketing for Unity Catalog tables.

In summary, if your use case involves machine learning and requires a centralized repository for features, Databricks Feature Store would be the preferred choice. However, if you need a unified, secure, and fully managed metastore that supports various data formats and SQL functions, Unity Catalog would be more suitable.


2.) I am attaching the official docs which you can look into to know about feature store setup and workflow:

https://www.databricks.com/p/ebook/the-comprehensive-guide-to-feature-stores

https://docs.databricks.com/machine-learning/feature-store/index.html

 

3.) At this time, Feature Store does not support writing to a Unity Catalog metastore. In Unity Catalog-enabled workspaces, you can write feature tables only to the default Hive metastore.

Best Regards,

Vinay M R

View solution in original post

2 REPLIES 2

Vinay_M_R
Valued Contributor II
Valued Contributor II

Hi @Northp  Good day!

1.)  A Feature Store is a centralized repository that enables data scientists to find and share features, ensuring that the same code used to compute the feature values is used for model training and inference. It is particularly useful in machine learning workflows, where feature engineering is a crucial step. Databricks Feature Store offers several benefits, such as discoverability, lineage, integration with model scoring and serving, and point-in-time lookups.

On the other hand, Unity Catalog is a metastore service that provides a unified, secure, and fully managed metastore across all Databricks workspaces in an account. It supports various data formats, SQL functions, and structured streaming workloads. It also allows for the management of metastore lifecycle and resources from the account console. However, it has limitations such as not supporting Scala, R, and workloads using the Machine Learning Runtime on clusters using the shared access mode, and not supporting bucketing for Unity Catalog tables.

In summary, if your use case involves machine learning and requires a centralized repository for features, Databricks Feature Store would be the preferred choice. However, if you need a unified, secure, and fully managed metastore that supports various data formats and SQL functions, Unity Catalog would be more suitable.


2.) I am attaching the official docs which you can look into to know about feature store setup and workflow:

https://www.databricks.com/p/ebook/the-comprehensive-guide-to-feature-stores

https://docs.databricks.com/machine-learning/feature-store/index.html

 

3.) At this time, Feature Store does not support writing to a Unity Catalog metastore. In Unity Catalog-enabled workspaces, you can write feature tables only to the default Hive metastore.

Best Regards,

Vinay M R

Northp
New Contributor II

Thanks for the answer, it's really help a lot, and understand more of the difference between unity catalog and feature store.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.