Hello Databricks Community,
I am currently using the Feature Engineering client and have a few questions about best practices for writing to Feature Store Tables.
I would like to know more about not using the write_table method directly from the feature engineering client. Instead, Iโm thinking of writing daily partitions to the Delta table by using the INSERT OVERWRITE statement with a PARTITION clause.
Before I proceed, I want to understand:
What are the potential consequences of not using the write_table function for Feature Store tables in this scenario? Specifically, how will this have any silent behaviour of the Feature Store tables if I do not write with write_table? (e.g. that data is not properly catalogued, or other out of the box functionality of the Feature Store)
Is INSERT OVERWRITE a bad practice for managing the write daily partition updates in a Feature Store table?
On the one side, I understand that using INSERT OVERWRITE may lead to data loss. Furthermore, using write_table can help identifying not idempotent pipelines. Moreover, for a given daily run it generated a set of records, and when backfilling 1 of those records was neither UPDATED nor INSERTED, which means that was written in the prior daily run. Therefore, there might be an issue with that pipeline.
On the other side, I may want to update the transformation code that generates a given partition and would like to OVERWRITE the data for a set of partitions, INSERT OVERWRITE can solve that with ease by simply backfilling.
Would write_table be more suitable for ensuring that records are consistently inserted or updated during re-runs, and to prevent data loss and identifying idempotent issues in backfill scenarios?
Any advice on how to best handle this scenario would be greatly appreciated!
Thanks in advance for your insights.