cancel
Showing results for 
Search instead for 
Did you mean: 
Warehousing & Analytics
Engage in discussions on data warehousing, analytics, and BI solutions within the Databricks Community. Share insights, tips, and best practices for leveraging data for informed decision-making.
cancel
Showing results for 
Search instead for 
Did you mean: 

What are the best practices to store VectorUDT into the databricks feature store?

145625
New Contributor II

Hello, I am having issues to store VectorUDT columns into the databricks feature store. I saw that the feature store is not able to handle this kind of data type, but can store arrays. So, I tried to convert my VectorsUDT into arrays as a workaround.

However, when I use the command .cast('array<double>') or the vector_to_array function from pyspark.ml.functions, I systematically get the same following error:

SparkException: Failed to execute user defined function(functions$$$Lambda$9020/747195126: (array<struct<type:tinyint,size:int,indices:array<int>,values:array<double>>>) => array<double>)

Caused by: IllegalArgumentException: function vector_to_array requires a non-null input argument and input type must be `org.apache.spark.ml.linalg.Vector` or `org.apache.spark.mllib.linalg.Vector`, but got scala.collection.mutable.WrappedArray$ofRef.

* How can I fix this error and convert my VectorUDT data into array?

* Or, are there better practices to store VectorUDT into the databricks feature store?

Thank you for your help!

0 REPLIES 0

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group