TzachZohar
New Contributor II

@kelleyrw might be worth mentioning that your code works well with Spark 2.0 (I've tried it with 2.0.2). However it's still not very well documented - as using Tuples is OK for the return type but not for the input type:

  • For UDF output types, you should use plain Scala types (e.g. tuples) as the type of the array elements
  • For UDF input types, arrays that contain tuples would actually have to be declared as
    mutable.WrappedArray[Row]

So, if you want to manipulate the input array and return the result, you'll have to perform some conversion from Row into Tuples explicitly.