cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
cancel
Showing results for 
Search instead for 
Did you mean: 

What are the practical advantage of Feature Store compared to Delta Lake?

Saeid_H
Contributor

Could someone explain the practical advantages of using a feature store vs. Delta Lake. apparently they both work in the same manner and the feature store does not provide additional value. However, based on the documentation on the databricks page, there are certain benefits of using FS, which then again it is not tangible for me. What is really important here, is to know what is the huge technical advantages or selling point of FS over Delta Table.

With many thanks in advance!

7 REPLIES 7

LandanG
Honored Contributor
Honored Contributor

Hi @Saeid Hedayati​ ,

One thing to note is that Databricks Feature Store tables are delta tables, the difference being a different UI and some additional capabilities. So it's not neccessarily FS vs. Delta, as the FS uses delta.

The benefits of using the Databricks FS (as opposed to a standalone delta table) are primarily due to how easily and automatically it integrates with other Databricks features like MLflow, and it provides capabilities like lineage (ie. which models and notebooks are using which features), automatic lookup for models, as well as the UI (any more that I'm probably forgetting).

I'm not sure if that answers your question. I can try to clarify more if needed

Hi @Landan George​ ,

Thanks for you prompt response and your detailed answers.

One point that I am not following, is regarding the integration with MLflow. Do you have a concrete example of that, in which using MLflow with feature store is easier, than using delta lake table.

One point that I thought, it could be the benefit of the feature store compared to normal delta lake table is, that the transformation in the feature preparation phase could get stored (for instance transform fitted functions of one-hot-encodig and min-max-scaler during training), so that once we deploy the model in the production and get a new batch of unseen raw data, we would not need to load the pickled fitted functions and apply it to the new data, Is my assumption wrong?

thanks!

LandanG
Honored Contributor
Honored Contributor

Hi @Saeid Hedayati​ ,

I'd take a look at this sample notebook to see how easily MLflow integrates with FS. Regarding your second paragraph, that is indeed one of the benefits of using a feature store.

Hi @Landan George​ ,

Thank you for sharing this notebook and also your insight. Sorry about not being clear with my statement above. What I meant was, if I have a preprocessing pipeline which does some transformation (such as one-hot-encoding or scaling the numerical values) on data before training, then we have to store the fitted functions for preprocessing as pickles and then once a new set of raw inference data in batch arrives, we have to apply transform functions on the raw data, so that it gets the format of training data. But what I thought genuinely the FS save the mapping of the transforamtion inside it and once we get new raw data for inference we do not need to load those pickled functions and apply it to them and we can directly transform them using FS capabilities. But I am assuming this functionality is not part of feature store yet. right?

Anonymous
Not applicable

This is a great blogpost on feature stores https://www.ethanrosenthal.com/2021/02/03/feature-stores-self-service/

Again, feature stores and delta lake are different technologies for different things. Our feature store does need to store data, and that's where delta lake is amazing. Delta Lake doesn't integrate with mlflow like our feature store does. The feature store does a better job of tracking lineage. The FS doesn't have a selling point, it's free to use in the Machine Learning Persona!

Hi @Joseph Kambourakis​ ,

Thank you for your response as well as sharing the blog post.

Anonymous
Not applicable

Hi @Saeid Hedayati​ 

Thank you for posting your question in our community! We are happy to assist you.

To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?

This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance! 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.