cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

I am saving a new feature table to the Databricks feature store, and it won't write the data sources of the tables used to create the feature table, because they are Hive tables that point to Azure Data Lake Storage Gen1 Delta tables

Jack_Watson
Contributor

My notebook is pulling in Hive tables from DBFS, that point to ADLS Gen1 file locations for their data (Delta tables), creating the feature table as a data frame within the notebook, then calling on the feature store client to save down the feature table to the database I have created to save feature tables that go into the feature store. When I call 'create_table', it successfully creates and saves the feature table to the database, and it is viewable in the feature store, however it does not write the data sources of the tables used to create the feature table down, as the ADLS file paths are deemed to be invalid, and the error message states that the path name for the table must be a valid dbfs file path, even though the tables sit in DBFS, but point to the Azure data lake.

Error Message (Changed actual file path as to keep it confidential):

Exception: {'error_code': 'INVALID_PARAMETER_VALUE', 'message': 'Path name adl://****PATH_TO_DELTA_TABLE_IN_ADLS_GEN1_LAKE**** must be a valid dbfs:/ path.

I would like to know if there is a way I can get the feature store client to write the data sources, for the tables that create the feature table, as the table's actual ADLS file path?

1 ACCEPTED SOLUTION

Accepted Solutions

Atanu
Databricks Employee
Databricks Employee

@Jack Watsonโ€‹  Could you please confirm the write is succeeding ? If yes, as per my understanding This is a warning for some validation that we will be removing shortly. Weโ€™ll likely remove the validation which save the data source.Thanks.

View solution in original post

4 REPLIES 4

Anonymous
Not applicable

Hello, @Jack Watsonโ€‹! My name is Piper and I'm a moderator for the Databricks community. Thank you for asking and welcome to the community!

Let's give the other members a chance to respond before we circle back to you. Thanks in advance for your patience.

virtualzx
New Contributor II

@Jack Watsonโ€‹  We have encountered a similar issue since we upgraded to the most recent build. The code that use to work does not work anymore. Basically if the spark dataframe is dynamically generated or backed by a cloud storage bucket the storage fails. However, you can get around this by temporarily save to dbfs first, then load it back out and save to the feature store. @Kaniz Fatmaโ€‹  this is clearly a bug as the same code use to run without error and the code given in the link you gave will simply fail if the data source is loaded from a cloud storage bucket.

Atanu
Databricks Employee
Databricks Employee

@Jack Watsonโ€‹  Could you please confirm the write is succeeding ? If yes, as per my understanding This is a warning for some validation that we will be removing shortly. Weโ€™ll likely remove the validation which save the data source.Thanks.

Atanu
Databricks Employee
Databricks Employee

Though we do not have ETA at this moment.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group