01-21-2022 06:31 AM
My notebook is pulling in Hive tables from DBFS, that point to ADLS Gen1 file locations for their data (Delta tables), creating the feature table as a data frame within the notebook, then calling on the feature store client to save down the feature table to the database I have created to save feature tables that go into the feature store. When I call 'create_table', it successfully creates and saves the feature table to the database, and it is viewable in the feature store, however it does not write the data sources of the tables used to create the feature table down, as the ADLS file paths are deemed to be invalid, and the error message states that the path name for the table must be a valid dbfs file path, even though the tables sit in DBFS, but point to the Azure data lake.
Error Message (Changed actual file path as to keep it confidential):
Exception: {'error_code': 'INVALID_PARAMETER_VALUE', 'message': 'Path name adl://****PATH_TO_DELTA_TABLE_IN_ADLS_GEN1_LAKE**** must be a valid dbfs:/ path.
I would like to know if there is a way I can get the feature store client to write the data sources, for the tables that create the feature table, as the table's actual ADLS file path?
02-12-2022 08:14 AM
@Jack Watson Could you please confirm the write is succeeding ? If yes, as per my understanding This is a warning for some validation that we will be removing shortly. We’ll likely remove the validation which save the data source.Thanks.
01-21-2022 12:11 PM
Hello, @Jack Watson! My name is Piper and I'm a moderator for the Databricks community. Thank you for asking and welcome to the community!
Let's give the other members a chance to respond before we circle back to you. Thanks in advance for your patience.
01-28-2022 08:34 AM
Hi @Jack Watson , You can go through this link. It has all you need about the feature store.
01-28-2022 02:45 PM
@Jack Watson We have encountered a similar issue since we upgraded to the most recent build. The code that use to work does not work anymore. Basically if the spark dataframe is dynamically generated or backed by a cloud storage bucket the storage fails. However, you can get around this by temporarily save to dbfs first, then load it back out and save to the feature store. @Kaniz Fatma this is clearly a bug as the same code use to run without error and the code given in the link you gave will simply fail if the data source is loaded from a cloud storage bucket.
01-28-2022 06:46 PM
Hi @Jack Watson , Thank you so much for flagging this. I'll look into this and get back to you. Thanks.
02-12-2022 08:14 AM
@Jack Watson Could you please confirm the write is succeeding ? If yes, as per my understanding This is a warning for some validation that we will be removing shortly. We’ll likely remove the validation which save the data source.Thanks.
02-12-2022 08:14 AM
Though we do not have ETA at this moment.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group