cancel
Showing results for 
Search instead for 
Did you mean: 
Warehousing & Analytics
Engage in discussions on data warehousing, analytics, and BI solutions within the Databricks Community. Share insights, tips, and best practices for leveraging data for informed decision-making.
cancel
Showing results for 
Search instead for 
Did you mean: 

What are the best practices to store VectorUDT into the databricks feature store?

145625
New Contributor II

Hello, I am having issues to store VectorUDT columns into the databricks feature store. I saw that the feature store is not able to handle this kind of data type, but can store arrays. So, I tried to convert my VectorsUDT into arrays as a workaround.

However, when I use the command .cast('array<double>') or the vector_to_array function from pyspark.ml.functions, I systematically get the same following error:

SparkException: Failed to execute user defined function(functions$$$Lambda$9020/747195126: (array<struct<type:tinyint,size:int,indices:array<int>,values:array<double>>>) => array<double>)

Caused by: IllegalArgumentException: function vector_to_array requires a non-null input argument and input type must be `org.apache.spark.ml.linalg.Vector` or `org.apache.spark.mllib.linalg.Vector`, but got scala.collection.mutable.WrappedArray$ofRef.

* How can I fix this error and convert my VectorUDT data into array?

* Or, are there better practices to store VectorUDT into the databricks feature store?

Thank you for your help!

0 REPLIES 0