cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Differences between Feature Store and Unity Catalog

Northp
New Contributor II

Our small team has just finished the data preparation phase of our project and started data analysis in Databricks. As we go deeper into this field, we're trying to understand the distinctions and appropriate uses for a Feature Store versus a Unity Catalog. On the surface, it seems to me that both are ways to manage and use tables of data.

For example, let's consider a churn prediction. We need to create a table (or several) for churn prediction and join later, which would include multiple features that could potentially influence a user's possibility to churn. It appears to me that we could just create a new table in a Unity Catalog with this information. However, I am not sure if this is the right approach or if a Feature Store could be more suitable.

Here are my questions

  1. In what situations would we prefer to use a Feature Store over a Unity Catalog, or vice versa?

  2. If we were to use a Feature Store in this scenario, what would that look like in terms of setup and workflow?

  3. Is it possible to use the Feature Store and write directly to the Unity Catalog? I've tried, but it's giving me an error, and I could only write in the Hive Metastore. This is a bit confusing for the dashboard and other uses.

Really appreciate every help, ideally, any examples would be great. I'm very noob using Databricks, and a real-world approach would be very very helpful.

1 ACCEPTED SOLUTION

Accepted Solutions

Vinay_M_R
Databricks Employee
Databricks Employee

Hi @Northp  Good day!

1.)  A Feature Store is a centralized repository that enables data scientists to find and share features, ensuring that the same code used to compute the feature values is used for model training and inference. It is particularly useful in machine learning workflows, where feature engineering is a crucial step. Databricks Feature Store offers several benefits, such as discoverability, lineage, integration with model scoring and serving, and point-in-time lookups.

On the other hand, Unity Catalog is a metastore service that provides a unified, secure, and fully managed metastore across all Databricks workspaces in an account. It supports various data formats, SQL functions, and structured streaming workloads. It also allows for the management of metastore lifecycle and resources from the account console. However, it has limitations such as not supporting Scala, R, and workloads using the Machine Learning Runtime on clusters using the shared access mode, and not supporting bucketing for Unity Catalog tables.

In summary, if your use case involves machine learning and requires a centralized repository for features, Databricks Feature Store would be the preferred choice. However, if you need a unified, secure, and fully managed metastore that supports various data formats and SQL functions, Unity Catalog would be more suitable.


2.) I am attaching the official docs which you can look into to know about feature store setup and workflow:

https://www.databricks.com/p/ebook/the-comprehensive-guide-to-feature-stores

https://docs.databricks.com/machine-learning/feature-store/index.html

 

3.) At this time, Feature Store does not support writing to a Unity Catalog metastore. In Unity Catalog-enabled workspaces, you can write feature tables only to the default Hive metastore.

Best Regards,

Vinay M R

View solution in original post

2 REPLIES 2

Vinay_M_R
Databricks Employee
Databricks Employee

Hi @Northp  Good day!

1.)  A Feature Store is a centralized repository that enables data scientists to find and share features, ensuring that the same code used to compute the feature values is used for model training and inference. It is particularly useful in machine learning workflows, where feature engineering is a crucial step. Databricks Feature Store offers several benefits, such as discoverability, lineage, integration with model scoring and serving, and point-in-time lookups.

On the other hand, Unity Catalog is a metastore service that provides a unified, secure, and fully managed metastore across all Databricks workspaces in an account. It supports various data formats, SQL functions, and structured streaming workloads. It also allows for the management of metastore lifecycle and resources from the account console. However, it has limitations such as not supporting Scala, R, and workloads using the Machine Learning Runtime on clusters using the shared access mode, and not supporting bucketing for Unity Catalog tables.

In summary, if your use case involves machine learning and requires a centralized repository for features, Databricks Feature Store would be the preferred choice. However, if you need a unified, secure, and fully managed metastore that supports various data formats and SQL functions, Unity Catalog would be more suitable.


2.) I am attaching the official docs which you can look into to know about feature store setup and workflow:

https://www.databricks.com/p/ebook/the-comprehensive-guide-to-feature-stores

https://docs.databricks.com/machine-learning/feature-store/index.html

 

3.) At this time, Feature Store does not support writing to a Unity Catalog metastore. In Unity Catalog-enabled workspaces, you can write feature tables only to the default Hive metastore.

Best Regards,

Vinay M R

Northp
New Contributor II

Thanks for the answer, it's really help a lot, and understand more of the difference between unity catalog and feature store.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group