cancel
Showing results for 
Search instead for 
Did you mean: 
Warehousing & Analytics
Engage in discussions on data warehousing, analytics, and BI solutions within the Databricks Community. Share insights, tips, and best practices for leveraging data for informed decision-making.
cancel
Showing results for 
Search instead for 
Did you mean: 

What are the best practices to store VectorsUDT in the databricks Feature Store?

145625
New Contributor II

What are the best practices to store VectorsUDT in the databricks Feature Store?

Hello, I am having issues to save a table that has 2 columns containing VectorUDT data. I saw that the Feature Store of databricks cannot handle this data type but can handle arrays. So, I tried to convert the 2 columns into arrays. 

When I tried to cast these columns as 'array<double>' using .cast('array<double>') or with the function vector_to_array from pyspark.ml.functions, I systematically got the same following error: 

SparkException: Failed to execute user defined function(functions$$$Lambda$9020/747195126: (array<struct<type:tinyint,size:int,indices:array<int>,values:array<double>>>) => array<double>)

Caused by: IllegalArgumentException: function vector_to_array requires a non-null input argument and input type must be `org.apache.spark.ml.linalg.Vector` or `org.apache.spark.mllib.linalg.Vector`, but got scala.collection.mutable.WrappedArray$ofRef.

* What can I do to fix this error and convert my VectorUDT to arrays?

* Or, are there better practices to store VectorUDT in the databricks Feature Store? 

Thank you for your help!

0 REPLIES 0

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group