Best Database for facial recognition/ Fast comparisons of Euclidean distance
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-06-2021 03:32 AM
Hello people,
I'm trying to build a facial recognition application, and I have a working API, that takes in an image of a face and spits out a vector that encodes it. I need to run this on a million faces, store them in a db and when the system goes online, it should take a face in, get a vector and compute the distance with all the other vectors to find the closest one. https://1921681001.id/
I'm hearing of locality sensitive hashing, and that mak https://19216811.cam/es sense, but what else can I do at the level of db selection and design that makes these things quicker? TIA
- Labels:
-
Database
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-06-2021 04:11 PM
You could do this with Spark storing in parquet/Delta. For each face you would write out a record with a column for metadata, a column for the encoded vector array, and other columns for hashing. You could use a PandasUDF to do the distributed distance calculation at scale, and could probably get fast run times on a million records.
'm not sure how you would come up with the hash criteria, but if you came up with some way to bin the vector encodings, you could add a column to the parquet/delta table with which vector encoding bin the vector falls into and then partition the table on that (or some combination of multiple bins). If you set it up that way, you could ensure that your PandasUDF only finds close matches within the partition/bin, which will speed up the match time. The downside is that you will miss out on edge cases where a vector got put into one partition, but its closest match was actually in another.For just a million records, I'd suggest avoid binning and if you need, encode your arrays as needed to reduce their length.
