Second Evaluation Report

This page outlines the work that has been completed as of the first evaluation in the GSoC program. Some code examples on how to use the module, some image results, theory as well as explanations into the design will be included in this. Rather than sectioning this page by week, it is sectioned according to the various submodules in the system.

To start off, the following is the work that was proposed to be completed by the first evaluation (present on the last page of the proposal):


Most of the tasks have been completed. The few tasks remaining, which will be completed in few days is the DAN Network (which I’m trying to minimize the size of currently), documentation of the classes, and added parameters to the Face Recognition Module. The purple highlighted portions are yet to be completed.

A more intesive few days should take care of these pending tasks, which came about due to a medical emergency in the second evaluation period of GSoC.

Face Recognition Module (FRM, part I)

The first task was to integration Face Detection and Face Alignment into the Face Recognition Module. Although now the default alignment and detection methods are used (FaceBoxes and KazemiSullivan), there will be a possibility for the user to specify which detection/alignment method he/she wants to use, and the associated parameters.

The Alignment and Detection step is used for both the training and testing phases of FRM. This is because the alignment methods being used also specify a bounding box requirement, and the faces supplied for training may not necessarily be bounded by the image border, i.e, the image could contain objects apart from the face. For this reason, face detection is run in the training phase too.

Face Embeddings

Face Identification, just like Detection and Alignment, follows the same Abstract Base Class structure for the embedding methods. The “embedders”, the methods that generate vector embeddings of the faces, are separated from the identification and verification process. The interface to the FaceEmbedder just requires passing the faces (that are of a fixed \(112\times 112\) size, which is how the Alignment methods also return the faces).


The embedder that was added is the MobileFaceNet face embedder. Similar to FaceBoxes, MobileFaceNet has also been implemented in TensorFlow and the model is passed with the package, which is possible due to the small size of the trained model (as opposed to DAN which had a model size greater than 100 MB). MobileFaceNet can process at real time on experimentation, getting upto 25 fps while running it on the evaluator. The evaluator results are shown in the next section. The documentation will contain the same details too, when it will be put up in a few days.

The following shows the ease of calling MobileFaceNet for generating the embeddings:

def transform_image(im):
# to be ported to MobileFaceNet
   im = cv2.resize(im, (112, 112))
   im = (im - 127.5) / 128.0
   return im

mfn = MobileFaceNet()
emb1 = mfn._embed([transform_image(im1)])

Face Embeddings Evaluation

The next step was to evaluate the performance of each embedder. For this we used the LFW dataset, which contains thousands of faces and also comes with a testing scheme with similar and dissimilar images. FaceEmbedderEvaluator, like the other evaluators, is meant and can be used for any general face verification dataset. The testing scheme used contained a total of 3000 pairs; although the LFW testing scheme involves averaging over multiple folds of the training set, our results are taken over the entirety of the testing pairs. This is done because our model is not meant to be trained on separate folds, but rather on a fixed training set.

FaceEmbedderEvaluator computes the following:

  • Accuracy: The fraction of True Positives and True Negatives across all the predicted outputs for the pairs.

  • ROC Curve: The ROC curve which is a plot between the True Positive vs False Positive across threshold limits. Ideally the curve must rise early on, as for low false positives, there must be more true positives at low thresholds.

  • AUC: As said above, the ROC Curve must rise fast and thus, the area under the curve (AUC) must be as large as possible.

  • EER: Equal Error Rate is an interesting metric which is used with the ROC curve for two-class problems where the cost is similar for predicting wrong on either class. For ROC, the EER is value of the x-axis (FP rate) at the point where the top left diagonal of the ROC curve intersects with the curve. It’s the thresholding point where the FP and FN values are equal.

As for the other datasets too, the LFW dataset too is added (with modified directory structure and essential meta-data) as a git LFS tar.gz file. For convenience, the above evaluation can be done directly through the notebook in tests/performance/face_embedding directory.


Face Identification

The FaceIdentifier class is a wrapper around the embedders and the FaceKNN class. FaceKNN performs the KNN training and testing, and FaceIdentifier calls upon FaceKNN as well as the embedder to get the embeddings to return the results for both training and testing phases.

FaceIdentifier can be used independent of the FRM module, if detection and alignment are not necessary. A straightforward way to do so is shown below:

from rekognition.face_identification import FaceIdentifier
FI = FaceIdentifier(training_faces_list, training_faces_labels)
pred_results = FI(testing_faces_list)

Face Recognition Module (FRM, Part II)

The remaining work was done in completing the Face Recognition Module. Currently it integrates the detection, alignment and identification (which includes embeddings) steps. The next step for FRM is to add user customizability of the models used within FRM.

At it’s most basic structure, FRM can be used as:

from rekognition import FRM
frm = FRM()
frm.train(faces, labels)
frm.test(images) # returns all faces detected

Another remaining work is to give more verbose detected faces details for the user, since that can be useful.

Next Step

The second evaluation period goals have three remaining tasks to be done: documentation, some additional customization on FRM, and addition of DAN. DAN go postponed due to the large model size: one possible way to add this is as a user defined model location, however I’m currently experimenting methods to reduce the size of this model.

Another work I’m doing parallelly is on the Shot Recognition of the videos. These should be finished in the next few days.