A facial recognition system is a technology capable of identifying or verifying a person from a digital image or a video frame from a video source. A subcategory of facial recognition, Face Verification, is the task of comparing two images of faces and verifying if there’s a match: two images are fed to the system and by some kind of returned value we can decide if they picture the same person. An useful application would be authenticating people by photos. In this article I cover the fundamentals of face verification and explain how it will be implemented in an application capable of verifying a user’s identity by comparing a just shot selfie and the picture contained in various kind of identity documents such has identity cards or driver licenses.
Under the hood, the face recognition system isolates the faces in the two images and calculates euclidean distance between them, returning a value between 0 and 4. Higher the distance, higher the probability that the faces belong to different people, so the system can validate the user if this value is below a given threshold. After some testing I found out that the “acceptable” value varies quite a bit from one dedicated library to the others, mostly because there are different trained models behind them.
STEP BY STEP
First of all, the face in the image must be recognized and isolated. As with most Object Detection architectures, we’re talking about finding a given pixel “value” based on the surrounding area of the image. By examining how bright/dark a pixel is compared with the ones around it is possible to synthetize the image information via gradients which shows the flow from light to dark across the image. At this point the system is capable of finding a face in the image by looking for sections of it where the gradients are similar to those of a face.
After being isolated from the rest of the image, the face must be wrapped so its features are placed in a known position. To do this, 68 key points called Landmark are found on the image. They identify features like eyes, eyebrows, nose, mouth and jaw. The image is the rotated, scaled and sheared to achieve the standard format which will be used to confront it with other faces.
From the isolated and correctly oriented face a numerical 128-dimensional vector is produced. This vector contains measurements of the facial features and euclidean distance can be calculated between two of them to check how different a face is from another one. As explained above a distance below a certain threshold indicates that the two faces belongs to the same person.
Measurements of the same person’s face must be similar across photos, and measurements of different people face must be different, no matter how similar they look because of their facial features. A method called Triplet Loss is used to train a Deep Convoluted Neural Network to do so. Like the name suggests, Triplet Loss bases its calculation on three images at a time: an anchor image, an image of the same face (positive example) and one of a different face (negative example). During training the anchor measurements are adjusted to make them closer to the positive example’s and farther away from the negative’s.
Author: Simone Grossi
Info or suggestion to s.grossi -at- quidinfo.it