I don't clearly understand your full problem. But I do know the following regarding UDFs:
1. PySpark UDFs are extremely slow, because it needs to deserialize the Java Object (DataFrame), transform it with the Python UDF, then Serialize it back. This happens on a row by row basis, making it extremely inefficient.
2. Second best approach for Python would be to use Pandas UDF as it works with batches of data, rather than row by row basis, which makes it much faster.
3. Using Scala instead of Python would solve this issue as it doesn't need to deserialize and serialize again, since Scala is Java based language. This would be the most efficient method.