Curious if you ever found a workable solution to this. Your question is still one of the top hits when I Google it. We are facing a similar challenge, where we want to be able to fuzzy match high volume lists of individuals in HDFS / Hive. Thinking ...