02-01-2023 05:16 AM
Could someone explain the practical advantages of using a feature store vs. Delta Lake. apparently they both work in the same manner and the feature store does not provide additional value. However, based on the documentation on the databricks page, there are certain benefits of using FS, which then again it is not tangible for me. What is really important here, is to know what is the huge technical advantages or selling point of FS over Delta Table.
With many thanks in advance!
02-01-2023 07:07 AM
Hi @Saeid Hedayati ,
One thing to note is that Databricks Feature Store tables are delta tables, the difference being a different UI and some additional capabilities. So it's not neccessarily FS vs. Delta, as the FS uses delta.
The benefits of using the Databricks FS (as opposed to a standalone delta table) are primarily due to how easily and automatically it integrates with other Databricks features like MLflow, and it provides capabilities like lineage (ie. which models and notebooks are using which features), automatic lookup for models, as well as the UI (any more that I'm probably forgetting).
I'm not sure if that answers your question. I can try to clarify more if needed
02-01-2023 07:44 AM
Hi @Landan George ,
Thanks for you prompt response and your detailed answers.
One point that I am not following, is regarding the integration with MLflow. Do you have a concrete example of that, in which using MLflow with feature store is easier, than using delta lake table.
One point that I thought, it could be the benefit of the feature store compared to normal delta lake table is, that the transformation in the feature preparation phase could get stored (for instance transform fitted functions of one-hot-encodig and min-max-scaler during training), so that once we deploy the model in the production and get a new batch of unseen raw data, we would not need to load the pickled fitted functions and apply it to the new data, Is my assumption wrong?
thanks!
02-01-2023 08:13 AM
Hi @Saeid Hedayati ,
I'd take a look at this sample notebook to see how easily MLflow integrates with FS. Regarding your second paragraph, that is indeed one of the benefits of using a feature store.
02-02-2023 06:59 AM
Hi @Landan George ,
Thank you for sharing this notebook and also your insight. Sorry about not being clear with my statement above. What I meant was, if I have a preprocessing pipeline which does some transformation (such as one-hot-encoding or scaling the numerical values) on data before training, then we have to store the fitted functions for preprocessing as pickles and then once a new set of raw inference data in batch arrives, we have to apply transform functions on the raw data, so that it gets the format of training data. But what I thought genuinely the FS save the mapping of the transforamtion inside it and once we get new raw data for inference we do not need to load those pickled functions and apply it to them and we can directly transform them using FS capabilities. But I am assuming this functionality is not part of feature store yet. right?
02-01-2023 10:28 AM
This is a great blogpost on feature stores https://www.ethanrosenthal.com/2021/02/03/feature-stores-self-service/
Again, feature stores and delta lake are different technologies for different things. Our feature store does need to store data, and that's where delta lake is amazing. Delta Lake doesn't integrate with mlflow like our feature store does. The feature store does a better job of tracking lineage. The FS doesn't have a selling point, it's free to use in the Machine Learning Persona!
02-03-2023 12:00 AM
Hi @Joseph Kambourakis ,
Thank you for your response as well as sharing the blog post.
04-08-2023 08:13 PM
Hi @Saeid Hedayati
Thank you for posting your question in our community! We are happy to assist you.
To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?
This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance!
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group