You could do this with Spark storing in parquet/Delta. For each face you would write out a record with a column for metadata, a column for the encoded vector array, and other columns for hashing. You could use a PandasUDF to do the distributed distance calculation at scale, and could probably get fast run times on a million records.
'm not sure how you would come up with the hash criteria, but if you came up with some way to bin the vector encodings, you could add a column to the parquet/delta table with which vector encoding bin the vector falls into and then partition the table on that (or some combination of multiple bins). If you set it up that way, you could ensure that your PandasUDF only finds close matches within the partition/bin, which will speed up the match time. The downside is that you will miss out on edge cases where a vector got put into one partition, but its closest match was actually in another.
For just a million records, I'd suggest avoid binning and if you need, encode your arrays as needed to reduce their length.