Machine Learning for Signal Processing - 2P2018

Assignment 2: A Simple Face Detector

Due date: April 14, 11:59 PM

Adapted from Bhiksha Raj.

Problem 1: A simple face detector

You are given a corpus of facial images [here] from the LFWCrop database. Each image in this corpus is 64 x 64 and grayscale. You must learn a typical (i.e. Eigen) face from them

You are also given four group photographs with multiple faces [here]. You must use the Eigen face you have learnt to detect the faces in these photos

The faces in the group photographs may have different sizes. You must account for these variations

Matlab is strongly recommended but you are free to use other programs if you want.

Some hints on how to read image files into matlab can be found here

You must compute the first Eigen face from this data. To do so, you will have to read all images into a matrix. Here are instructions for building a matrix of images in matlab. You must then compute the first Eigen vector for this matrix. Information on computing Eigen faces from an image matrix can be found here

To detect faces in the image, you must scan the group photo and identify all regions in it that “match” the patterns in Eigen face most. To “Scan” the image to find matches against an $N\times M$ Eigen face, you must match every $N\times M$ region of the photo against the Eigen face.

The match between any $N\times M$ region of an image and an Eigen face is given by the normalized dot product between the Eigen face and the region of the image being evaluated. The normalized dot product between an $N\times M$ Eigen face and a corresponding $N\times M$ segment of the image is given by $E\cdot P / \vert P \vert$, where $E$ is the vector (unrolled) representation of the Eigen face, and $P$ is the unrolled vector form of the $N\times M$ patch.

A simple matlab loop that scans an image for an Eigen vector is given here

The locations of faces are likely to be where the match score peaks.

Some tricks may be useful to get better results.

Scaling and Rotation

The Eigen face is fixed in size and can only be used to detect faces of approximately the same size as the Eigen face itself. On the other hand faces in the group photos are of different sizes – they get smaller as the subject gets farther away from the camera.

The solution to this is to make many copies of the eigen face and match them all.

In order to make your detection system robust, resize the Eigen faces from 64 pixels to 32x32, 48x48, 96x96, and 128x128 pixels in size. You can use the scaling techniques we discussed in the linear algebra lecture. Matlab also provides some easy tools for scaling images. You can find information on scaling images in matlab here. Once you’ve scaled your eigen face, you will have a total of five “typical” faces, one at each level of scaling. You must scan the group pictures with all of the five eigen faces. Each of them will give you a “match” score for each position on the image. If you simply locate the peaks in each of them, you may find all the faces. Sometimes multiple peaks will occur at the same position, or within a few pixels of one another. In these cases, you can merge all of these, they probably all represent the same face.

Additional heuristics may also be required (appropriate setting of thresholds, comparison of peak values from different scaling factors, addiitonal scaling etc.). These are for you to investigate.

[More hints]

Problem 2: A boosting based face detector

You are given a training corpus of facial images. You must learn the first K Eigen faces from the corpus. Set K = 10 initially but vary it appropriately such that you get the best results. Mean and variance normalize the images before computing Eigenfaces.

You are given a second training set of facial images. Express each image as a linear combination of the Eigen faces. i.e., express each face $F$ as
\[ F \approx w_{F,1}E_1 + w_{F,2}W_2 + w_{F,3}E_3 + \cdots + w_{F,K}E_K \] where $E_i$ is the $i$th Eigen face and $w_{F,i}$ is the weight of the $i$th Eigen face, when composing face $F$. $w_{F,i}$ can, of course, be computed as the dot product of $F$ and $E_i$

Represent each face by the set of weights for the Eigen faces, i.e. $F \rightarrow \{w_{F,1}, w_{F,2}, \cdots, w_{F,K}\}$.

You are also given a collection of non-face images in the dataset. Represent each of these images too as linear combinations of the Eigen faces, i.e. express each non-face image $NF$ as
\[ NF \approx w_{NF,1}E_1 + w_{NF,2}E_2 + w_{NF,3}E_3 + \cdots + w_{NF,K}E_K \]

As before, the weights $w_{NF,i}$ can be computed as dot products. Represent each of the non-face images by the set of weights i.e. $NF \rightarrow \{w_{NF,1}, w_{NF,2}, \cdots, w_{NF,K}\}$.

The set of weights for the Eigen faces are the features representing all the face and non-face images.

From the set of face and non-face images represented by the Eigenface weights, learn an Adaboost classifier to classify faces vs. non-faces.

You are given a fourth set which is a collection of face and non-face images. Use the adaboost classifier to classify these images.

The classifier you have learned will be for the same size of images that were used in the training data (64 x 64). Scale the classifier by scaling the Eigenfaces to other sizes (32 x 32, 48 x 48, 96 x 96, 128x 128).

Train and test data for this problem is here. It is a collection of face and non-face data.Use the data in the "train" subdirectory to train your classifier and classify the data in the "test" subdirectory.

Problem 3 (optional)

It will generally not be possible to represent a face exactly using a limited number of typical faces; as a result there will be an error between the face $F$ and the approximation in terms of the $K$ Eigenfaces. You can also compute the normalized total error in representation as:
\[ err_F = \frac{1}{N}\parallel F - \sum_i w_{F,i}E_i \parallel^2 \] where, $\parallel \bullet \parallel^2$ represents the sum of the squares of the error of each pixel, and $N$ represents the number of pixels in the image.

Represent each face by the set of weights for the Eigen faces and the error, i.e. $F \rightarrow {w_{F,1}, w_{F,2}, \cdots, w_{F,K}, err_F}$

As in the case of faces, the approximation of the non-face images in terms of Eigenfaces will not be exact and will result in error. You can compute the normalized total error as you did for the face images to obtain $err_{NF}$.

Represent each of the non-face images by the set of weights i.e. $NF \rightarrow {w_{NF,1}, w_{NF,2}, \cdots, w_{NF,K}, err_{NF}}$

Learn and build a classifier in the same way you did for problem 3 but including normalized error as a feature. Use this classifier for Problem 4.

Problem 4 (optional)

Scan the group photographs to detect faces using your adaboost classifier.

You can adjust the tradeoff between missing faces and false alarms by comparing the margin $H(x)$ of the Adaboost classifier to a threshold other than 0.

Submission Details

What to submit: