I have a list of client provided data, a list of company names.
I have to match those names with an internal database of company names. The client list can fit in memory (its about 10k elements) but the internal dataset is on hdfs and we use Spark ...