This notebook contains an excerpt from the Python Data Science Handbook by Jake VanderPlas; the content is available on GitHub.

The text is released under the CC-BY-NC-ND license, and code is released under the MIT license. If you find this content useful, please consider supporting the work by buying the book!

< In-Depth: Kernel Density Estimation | Contents | Further Machine Learning Resources >

Application: A Face Detection Pipeline¶

This chapter has explored a number of the central concepts and algorithms of machine learning. But moving from these concepts to real-world application can be a challenge. Real-world datasets are noisy and heterogeneous, may have missing features, and data may be in a form that is difficult to map to a clean [n_samples, n_features] matrix. Before applying any of the methods discussed here, you must first extract these features from your data: there is no formula for how to do this that applies across all domains, and thus this is where you as a data scientist must exercise your own intuition and expertise.

One interesting and compelling application of machine learning is to images, and we have already seen a few examples of this where pixel-level features are used for classification. In the real world, data is rarely so uniform and simple pixels will not be suitable: this has led to a large literature on feature extraction methods for image data (see Feature Engineering).

In this section, we will take a look at one such feature extraction technique, the Histogram of Oriented Gradients (HOG), which transforms image pixels into a vector representation that is sensitive to broadly informative image features regardless of confounding factors like illumination. We will use these features to develop a simple face detection pipeline, using machine learning algorithms and concepts we've seen throughout this chapter.

We begin with the standard imports:

In [1]:

  Copied!     
 
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()
import numpy as np
%matplotlib inline import matplotlib.pyplot as plt import seaborn as sns; sns.set() import numpy as np

HOG Features¶

The Histogram of Gradients is a straightforward feature extraction procedure that was developed in the context of identifying pedestrians within images. HOG involves the following steps:

Optionally pre-normalize images. This leads to features that resist dependence on variations in illumination.
Convolve the image with two filters that are sensitive to horizontal and vertical brightness gradients. These capture edge, contour, and texture information.
Subdivide the image into cells of a predetermined size, and compute a histogram of the gradient orientations within each cell.
Normalize the histograms in each cell by comparing to the block of neighboring cells. This further suppresses the effect of illumination across the image.
Construct a one-dimensional feature vector from the information in each cell.

A fast HOG extractor is built into the Scikit-Image project, and we can try it out relatively quickly and visualize the oriented gradients within each cell:

In [2]:

  Copied!     
 
from skimage import data, color, feature
import skimage.data

image = color.rgb2gray(data.chelsea())
hog_vec, hog_vis = feature.hog(image, visualise=True)

fig, ax = plt.subplots(1, 2, figsize=(12, 6),
                       subplot_kw=dict(xticks=[], yticks=[]))
ax[0].imshow(image, cmap='gray')
ax[0].set_title('input image')

ax[1].imshow(hog_vis)
ax[1].set_title('visualization of HOG features');
from skimage import data, color, feature import skimage.data image = color.rgb2gray(data.chelsea()) hog_vec, hog_vis = feature.hog(image, visualise=True) fig, ax = plt.subplots(1, 2, figsize=(12, 6), subplot_kw=dict(xticks=[], yticks=[])) ax[0].imshow(image, cmap='gray') ax[0].set_title('input image') ax[1].imshow(hog_vis) ax[1].set_title('visualization of HOG features');

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[2], line 5
      2 import skimage.data
      4 image = color.rgb2gray(data.chelsea())
----> 5 hog_vec, hog_vis = feature.hog(image, visualise=True)
      7 fig, ax = plt.subplots(1, 2, figsize=(12, 6),
      8                        subplot_kw=dict(xticks=[], yticks=[]))
      9 ax[0].imshow(image, cmap='gray')

File /opt/conda/lib/python3.10/site-packages/skimage/_shared/utils.py:394, in channel_as_last_axis.__call__.<locals>.fixed_func(*args, **kwargs)
    391 channel_axis = kwargs.get('channel_axis', None)
    393 if channel_axis is None:
--> 394     return func(*args, **kwargs)
    396 # TODO: convert scalars to a tuple in anticipation of eventually
    397 #       supporting a tuple of channel axes. Right now, only an
    398 #       integer or a single-element tuple is supported, though.
    399 if np.isscalar(channel_axis):

File /opt/conda/lib/python3.10/site-packages/skimage/_shared/utils.py:348, in deprecate_multichannel_kwarg.__call__.<locals>.fixed_func(*args, **kwargs)
    345     kwargs['channel_axis'] = convert[kwargs.pop('multichannel')]
    347 # Call the function with the fixed arguments
--> 348 return func(*args, **kwargs)

TypeError: hog() got an unexpected keyword argument 'visualise'

HOG in Action: A Simple Face Detector¶

Using these HOG features, we can build up a simple facial detection algorithm with any Scikit-Learn estimator; here we will use a linear support vector machine (refer back to In-Depth: Support Vector Machines if you need a refresher on this). The steps are as follows:

Obtain a set of image thumbnails of faces to constitute "positive" training samples.
Obtain a set of image thumbnails of non-faces to constitute "negative" training samples.
Extract HOG features from these training samples.
Train a linear SVM classifier on these samples.
For an "unknown" image, pass a sliding window across the image, using the model to evaluate whether that window contains a face or not.
If detections overlap, combine them into a single window.

Let's go through these steps and try it out:

1. Obtain a set of positive training samples¶

Let's start by finding some positive training samples that show a variety of faces. We have one easy set of data to work with—the Labeled Faces in the Wild dataset, which can be downloaded by Scikit-Learn:

In [3]:

  Copied!     
 
from sklearn.datasets import fetch_lfw_people
faces = fetch_lfw_people()
positive_patches = faces.images
positive_patches.shape
from sklearn.datasets import fetch_lfw_people faces = fetch_lfw_people() positive_patches = faces.images positive_patches.shape

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
File /opt/conda/lib/python3.10/urllib/request.py:1348, in AbstractHTTPHandler.do_open(self, http_class, req, **http_conn_args)
   1347 try:
-> 1348     h.request(req.get_method(), req.selector, req.data, headers,
   1349               encode_chunked=req.has_header('Transfer-encoding'))
   1350 except OSError as err: # timeout error

File /opt/conda/lib/python3.10/http/client.py:1282, in HTTPConnection.request(self, method, url, body, headers, encode_chunked)
   1281 """Send a complete request to the server."""
-> 1282 self._send_request(method, url, body, headers, encode_chunked)

File /opt/conda/lib/python3.10/http/client.py:1328, in HTTPConnection._send_request(self, method, url, body, headers, encode_chunked)
   1327     body = _encode(body, 'body')
-> 1328 self.endheaders(body, encode_chunked=encode_chunked)

File /opt/conda/lib/python3.10/http/client.py:1277, in HTTPConnection.endheaders(self, message_body, encode_chunked)
   1276     raise CannotSendHeader()
-> 1277 self._send_output(message_body, encode_chunked=encode_chunked)

File /opt/conda/lib/python3.10/http/client.py:1037, in HTTPConnection._send_output(self, message_body, encode_chunked)
   1036 del self._buffer[:]
-> 1037 self.send(msg)
   1039 if message_body is not None:
   1040 
   1041     # create a consistent interface to message_body

File /opt/conda/lib/python3.10/http/client.py:975, in HTTPConnection.send(self, data)
    974 if self.auto_open:
--> 975     self.connect()
    976 else:

File /opt/conda/lib/python3.10/http/client.py:1447, in HTTPSConnection.connect(self)
   1445 "Connect to a host on a given (SSL) port."
-> 1447 super().connect()
   1449 if self._tunnel_host:

File /opt/conda/lib/python3.10/http/client.py:941, in HTTPConnection.connect(self)
    940 sys.audit("http.client.connect", self, self.host, self.port)
--> 941 self.sock = self._create_connection(
    942     (self.host,self.port), self.timeout, self.source_address)
    943 # Might fail in OSs that don't implement TCP_NODELAY

File /opt/conda/lib/python3.10/socket.py:845, in create_connection(address, timeout, source_address)
    844 try:
--> 845     raise err
    846 finally:
    847     # Break explicitly a reference cycle

File /opt/conda/lib/python3.10/socket.py:833, in create_connection(address, timeout, source_address)
    832     sock.bind(source_address)
--> 833 sock.connect(sa)
    834 # Break explicitly a reference cycle

OSError: [Errno 99] Cannot assign requested address

During handling of the above exception, another exception occurred:

URLError                                  Traceback (most recent call last)
Cell In[3], line 2
      1 from sklearn.datasets import fetch_lfw_people
----> 2 faces = fetch_lfw_people()
      3 positive_patches = faces.images
      4 positive_patches.shape

File /opt/conda/lib/python3.10/site-packages/sklearn/datasets/_lfw.py:328, in fetch_lfw_people(data_home, funneled, resize, min_faces_per_person, color, slice_, download_if_missing, return_X_y)
    234 def fetch_lfw_people(
    235     *,
    236     data_home=None,
   (...)
    243     return_X_y=False,
    244 ):
    245     """Load the Labeled Faces in the Wild (LFW) people dataset \
    246 (classification).
    247 
   (...)
    326         .. versionadded:: 0.20
    327     """
--> 328     lfw_home, data_folder_path = _check_fetch_lfw(
    329         data_home=data_home, funneled=funneled, download_if_missing=download_if_missing
    330     )
    331     logger.debug("Loading LFW people faces from %s", lfw_home)
    333     # wrap the loader in a memoizing function that will return memmaped data
    334     # arrays for optimal memory usage

File /opt/conda/lib/python3.10/site-packages/sklearn/datasets/_lfw.py:88, in _check_fetch_lfw(data_home, funneled, download_if_missing)
     86 if download_if_missing:
     87     logger.info("Downloading LFW metadata: %s", target.url)
---> 88     _fetch_remote(target, dirname=lfw_home)
     89 else:
     90     raise IOError("%s is missing" % target_filepath)

File /opt/conda/lib/python3.10/site-packages/sklearn/datasets/_base.py:1324, in _fetch_remote(remote, dirname)
   1302 """Helper function to download a remote dataset into path
   1303 
   1304 Fetch a dataset pointed by remote's url, save into path using remote's
   (...)
   1320     Full path of the created file.
   1321 """
   1323 file_path = remote.filename if dirname is None else join(dirname, remote.filename)
-> 1324 urlretrieve(remote.url, file_path)
   1325 checksum = _sha256(file_path)
   1326 if remote.checksum != checksum:

File /opt/conda/lib/python3.10/urllib/request.py:241, in urlretrieve(url, filename, reporthook, data)
    224 """
    225 Retrieve a URL into a temporary location on disk.
    226 
   (...)
    237 data file as well as the resulting HTTPMessage object.
    238 """
    239 url_type, path = _splittype(url)
--> 241 with contextlib.closing(urlopen(url, data)) as fp:
    242     headers = fp.info()
    244     # Just return the local path and the "headers" for file://
    245     # URLs. No sense in performing a copy unless requested.

File /opt/conda/lib/python3.10/urllib/request.py:216, in urlopen(url, data, timeout, cafile, capath, cadefault, context)
    214 else:
    215     opener = _opener
--> 216 return opener.open(url, data, timeout)

File /opt/conda/lib/python3.10/urllib/request.py:525, in OpenerDirector.open(self, fullurl, data, timeout)
    523 for processor in self.process_response.get(protocol, []):
    524     meth = getattr(processor, meth_name)
--> 525     response = meth(req, response)
    527 return response

File /opt/conda/lib/python3.10/urllib/request.py:634, in HTTPErrorProcessor.http_response(self, request, response)
    631 # According to RFC 2616, "2xx" code indicates that the client's
    632 # request was successfully received, understood, and accepted.
    633 if not (200 <= code < 300):
--> 634     response = self.parent.error(
    635         'http', request, response, code, msg, hdrs)
    637 return response

File /opt/conda/lib/python3.10/urllib/request.py:557, in OpenerDirector.error(self, proto, *args)
    555     http_err = 0
    556 args = (dict, proto, meth_name) + args
--> 557 result = self._call_chain(*args)
    558 if result:
    559     return result

File /opt/conda/lib/python3.10/urllib/request.py:496, in OpenerDirector._call_chain(self, chain, kind, meth_name, *args)
    494 for handler in handlers:
    495     func = getattr(handler, meth_name)
--> 496     result = func(*args)
    497     if result is not None:
    498         return result

File /opt/conda/lib/python3.10/urllib/request.py:749, in HTTPRedirectHandler.http_error_302(self, req, fp, code, msg, headers)
    746 fp.read()
    747 fp.close()
--> 749 return self.parent.open(new, timeout=req.timeout)

File /opt/conda/lib/python3.10/urllib/request.py:519, in OpenerDirector.open(self, fullurl, data, timeout)
    516     req = meth(req)
    518 sys.audit('urllib.Request', req.full_url, req.data, req.headers, req.get_method())
--> 519 response = self._open(req, data)
    521 # post-process response
    522 meth_name = protocol+"_response"

File /opt/conda/lib/python3.10/urllib/request.py:536, in OpenerDirector._open(self, req, data)
    533     return result
    535 protocol = req.type
--> 536 result = self._call_chain(self.handle_open, protocol, protocol +
    537                           '_open', req)
    538 if result:
    539     return result

File /opt/conda/lib/python3.10/urllib/request.py:496, in OpenerDirector._call_chain(self, chain, kind, meth_name, *args)
    494 for handler in handlers:
    495     func = getattr(handler, meth_name)
--> 496     result = func(*args)
    497     if result is not None:
    498         return result

File /opt/conda/lib/python3.10/urllib/request.py:1391, in HTTPSHandler.https_open(self, req)
   1390 def https_open(self, req):
-> 1391     return self.do_open(http.client.HTTPSConnection, req,
   1392         context=self._context, check_hostname=self._check_hostname)

File /opt/conda/lib/python3.10/urllib/request.py:1351, in AbstractHTTPHandler.do_open(self, http_class, req, **http_conn_args)
   1348         h.request(req.get_method(), req.selector, req.data, headers,
   1349                   encode_chunked=req.has_header('Transfer-encoding'))
   1350     except OSError as err: # timeout error
-> 1351         raise URLError(err)
   1352     r = h.getresponse()
   1353 except:

URLError: <urlopen error [Errno 99] Cannot assign requested address>

This gives us a sample of 13,000 face images to use for training.

2. Obtain a set of negative training samples¶

Next we need a set of similarly sized thumbnails which do not have a face in them. One way to do this is to take any corpus of input images, and extract thumbnails from them at a variety of scales. Here we can use some of the images shipped with Scikit-Image, along with Scikit-Learn's PatchExtractor:

In [4]:

  Copied!     
 
from skimage import data, transform

imgs_to_use = ['camera', 'text', 'coins', 'moon',
               'page', 'clock', 'immunohistochemistry',
               'chelsea', 'coffee', 'hubble_deep_field']
images = [color.rgb2gray(getattr(data, name)())
          for name in imgs_to_use]
from skimage import data, transform imgs_to_use = ['camera', 'text', 'coins', 'moon', 'page', 'clock', 'immunohistochemistry', 'chelsea', 'coffee', 'hubble_deep_field'] images = [color.rgb2gray(getattr(data, name)()) for name in imgs_to_use]

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[4], line 6
      1 from skimage import data, transform
      3 imgs_to_use = ['camera', 'text', 'coins', 'moon',
      4                'page', 'clock', 'immunohistochemistry',
      5                'chelsea', 'coffee', 'hubble_deep_field']
----> 6 images = [color.rgb2gray(getattr(data, name)())
      7           for name in imgs_to_use]

Cell In[4], line 6, in <listcomp>(.0)
      1 from skimage import data, transform
      3 imgs_to_use = ['camera', 'text', 'coins', 'moon',
      4                'page', 'clock', 'immunohistochemistry',
      5                'chelsea', 'coffee', 'hubble_deep_field']
----> 6 images = [color.rgb2gray(getattr(data, name)())
      7           for name in imgs_to_use]

File /opt/conda/lib/python3.10/site-packages/skimage/_shared/utils.py:394, in channel_as_last_axis.__call__.<locals>.fixed_func(*args, **kwargs)
    391 channel_axis = kwargs.get('channel_axis', None)
    393 if channel_axis is None:
--> 394     return func(*args, **kwargs)
    396 # TODO: convert scalars to a tuple in anticipation of eventually
    397 #       supporting a tuple of channel axes. Right now, only an
    398 #       integer or a single-element tuple is supported, though.
    399 if np.isscalar(channel_axis):

File /opt/conda/lib/python3.10/site-packages/skimage/color/colorconv.py:875, in rgb2gray(rgb, channel_axis)
    834 @channel_as_last_axis(multichannel_output=False)
    835 def rgb2gray(rgb, *, channel_axis=-1):
    836     """Compute luminance of an RGB image.
    837 
    838     Parameters
   (...)
    873     >>> img_gray = rgb2gray(img)
    874     """
--> 875     rgb = _prepare_colorarray(rgb)
    876     coeffs = np.array([0.2125, 0.7154, 0.0721], dtype=rgb.dtype)
    877     return rgb @ coeffs

File /opt/conda/lib/python3.10/site-packages/skimage/color/colorconv.py:140, in _prepare_colorarray(arr, force_copy, channel_axis)
    137 if arr.shape[channel_axis] != 3:
    138     msg = (f'the input array must have size 3 along `channel_axis`, '
    139            f'got {arr.shape}')
--> 140     raise ValueError(msg)
    142 float_dtype = _supported_float_type(arr.dtype)
    143 if float_dtype == np.float32:

ValueError: the input array must have size 3 along `channel_axis`, got (512, 512)

In [5]:

  Copied!     
 
from sklearn.feature_extraction.image import PatchExtractor

def extract_patches(img, N, scale=1.0, patch_size=positive_patches[0].shape):
    extracted_patch_size = tuple((scale * np.array(patch_size)).astype(int))
    extractor = PatchExtractor(patch_size=extracted_patch_size,
                               max_patches=N, random_state=0)
    patches = extractor.transform(img[np.newaxis])
    if scale != 1:
        patches = np.array([transform.resize(patch, patch_size)
                            for patch in patches])
    return patches

negative_patches = np.vstack([extract_patches(im, 1000, scale)
                              for im in images for scale in [0.5, 1.0, 2.0]])
negative_patches.shape
from sklearn.feature_extraction.image import PatchExtractor def extract_patches(img, N, scale=1.0, patch_size=positive_patches[0].shape): extracted_patch_size = tuple((scale * np.array(patch_size)).astype(int)) extractor = PatchExtractor(patch_size=extracted_patch_size, max_patches=N, random_state=0) patches = extractor.transform(img[np.newaxis]) if scale != 1: patches = np.array([transform.resize(patch, patch_size) for patch in patches]) return patches negative_patches = np.vstack([extract_patches(im, 1000, scale) for im in images for scale in [0.5, 1.0, 2.0]]) negative_patches.shape

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[5], line 3
      1 from sklearn.feature_extraction.image import PatchExtractor
----> 3 def extract_patches(img, N, scale=1.0, patch_size=positive_patches[0].shape):
      4     extracted_patch_size = tuple((scale * np.array(patch_size)).astype(int))
      5     extractor = PatchExtractor(patch_size=extracted_patch_size,
      6                                max_patches=N, random_state=0)

NameError: name 'positive_patches' is not defined

We now have 30,000 suitable image patches which do not contain faces. Let's take a look at a few of them to get an idea of what they look like:

In [6]:

  Copied!     
 
fig, ax = plt.subplots(6, 10)
for i, axi in enumerate(ax.flat):
    axi.imshow(negative_patches[500 * i], cmap='gray')
    axi.axis('off')
fig, ax = plt.subplots(6, 10) for i, axi in enumerate(ax.flat): axi.imshow(negative_patches[500 * i], cmap='gray') axi.axis('off')

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[6], line 3
      1 fig, ax = plt.subplots(6, 10)
      2 for i, axi in enumerate(ax.flat):
----> 3     axi.imshow(negative_patches[500 * i], cmap='gray')
      4     axi.axis('off')

NameError: name 'negative_patches' is not defined

Our hope is that these would sufficiently cover the space of "non-faces" that our algorithm is likely to see.

3. Combine sets and extract HOG features¶

Now that we have these positive samples and negative samples, we can combine them and compute HOG features. This step takes a little while, because the HOG features involve a nontrivial computation for each image:

In [7]:

  Copied!     
 
from itertools import chain
X_train = np.array([feature.hog(im)
                    for im in chain(positive_patches,
                                    negative_patches)])
y_train = np.zeros(X_train.shape[0])
y_train[:positive_patches.shape[0]] = 1
from itertools import chain X_train = np.array([feature.hog(im) for im in chain(positive_patches, negative_patches)]) y_train = np.zeros(X_train.shape[0]) y_train[:positive_patches.shape[0]] = 1

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[7], line 3
      1 from itertools import chain
      2 X_train = np.array([feature.hog(im)
----> 3                     for im in chain(positive_patches,
      4                                     negative_patches)])
      5 y_train = np.zeros(X_train.shape[0])
      6 y_train[:positive_patches.shape[0]] = 1

NameError: name 'positive_patches' is not defined

In [8]:

  Copied!     
 
X_train.shape
X_train.shape

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[8], line 1
----> 1 X_train.shape

NameError: name 'X_train' is not defined

We are left with 43,000 training samples in 1,215 dimensions, and we now have our data in a form that we can feed into Scikit-Learn!

4. Training a support vector machine¶

Next we use the tools we have been exploring in this chapter to create a classifier of thumbnail patches. For such a high-dimensional binary classification task, a Linear support vector machine is a good choice. We will use Scikit-Learn's LinearSVC, because in comparison to SVC it often has better scaling for large number of samples.

First, though, let's use a simple Gaussian naive Bayes to get a quick baseline:

In [9]:

  Copied!     
 
from sklearn.naive_bayes import GaussianNB
from sklearn.cross_validation import cross_val_score

cross_val_score(GaussianNB(), X_train, y_train)
from sklearn.naive_bayes import GaussianNB from sklearn.cross_validation import cross_val_score cross_val_score(GaussianNB(), X_train, y_train)

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[9], line 2
      1 from sklearn.naive_bayes import GaussianNB
----> 2 from sklearn.cross_validation import cross_val_score
      4 cross_val_score(GaussianNB(), X_train, y_train)

ModuleNotFoundError: No module named 'sklearn.cross_validation'

We see that on our training data, even a simple naive Bayes algorithm gets us upwards of 90% accuracy. Let's try the support vector machine, with a grid search over a few choices of the C parameter:

In [10]:

  Copied!     
 
from sklearn.svm import LinearSVC
from sklearn.grid_search import GridSearchCV
grid = GridSearchCV(LinearSVC(), {'C': [1.0, 2.0, 4.0, 8.0]})
grid.fit(X_train, y_train)
grid.best_score_
from sklearn.svm import LinearSVC from sklearn.grid_search import GridSearchCV grid = GridSearchCV(LinearSVC(), {'C': [1.0, 2.0, 4.0, 8.0]}) grid.fit(X_train, y_train) grid.best_score_

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[10], line 2
      1 from sklearn.svm import LinearSVC
----> 2 from sklearn.grid_search import GridSearchCV
      3 grid = GridSearchCV(LinearSVC(), {'C': [1.0, 2.0, 4.0, 8.0]})
      4 grid.fit(X_train, y_train)

ModuleNotFoundError: No module named 'sklearn.grid_search'

In [11]:

  Copied!     
 
grid.best_params_
grid.best_params_

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[11], line 1
----> 1 grid.best_params_

NameError: name 'grid' is not defined

Let's take the best estimator and re-train it on the full dataset:

In [12]:

  Copied!     
 
model = grid.best_estimator_
model.fit(X_train, y_train)
model = grid.best_estimator_ model.fit(X_train, y_train)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[12], line 1
----> 1 model = grid.best_estimator_
      2 model.fit(X_train, y_train)

NameError: name 'grid' is not defined

5. Find faces in a new image¶

Now that we have this model in place, let's grab a new image and see how the model does. We will use one portion of the astronaut image for simplicity (see discussion of this in Caveats and Improvements), and run a sliding window over it and evaluate each patch:

In [13]:

  Copied!     
 
test_image = skimage.data.astronaut()
test_image = skimage.color.rgb2gray(test_image)
test_image = skimage.transform.rescale(test_image, 0.5)
test_image = test_image[:160, 40:180]

plt.imshow(test_image, cmap='gray')
plt.axis('off');
test_image = skimage.data.astronaut() test_image = skimage.color.rgb2gray(test_image) test_image = skimage.transform.rescale(test_image, 0.5) test_image = test_image[:160, 40:180] plt.imshow(test_image, cmap='gray') plt.axis('off');

Next, let's create a window that iterates over patches of this image, and compute HOG features for each patch:

In [14]:

  Copied!     
 
def sliding_window(img, patch_size=positive_patches[0].shape,
                   istep=2, jstep=2, scale=1.0):
    Ni, Nj = (int(scale * s) for s in patch_size)
    for i in range(0, img.shape[0] - Ni, istep):
        for j in range(0, img.shape[1] - Ni, jstep):
            patch = img[i:i + Ni, j:j + Nj]
            if scale != 1:
                patch = transform.resize(patch, patch_size)
            yield (i, j), patch
            
indices, patches = zip(*sliding_window(test_image))
patches_hog = np.array([feature.hog(patch) for patch in patches])
patches_hog.shape
def sliding_window(img, patch_size=positive_patches[0].shape, istep=2, jstep=2, scale=1.0): Ni, Nj = (int(scale * s) for s in patch_size) for i in range(0, img.shape[0] - Ni, istep): for j in range(0, img.shape[1] - Ni, jstep): patch = img[i:i + Ni, j:j + Nj] if scale != 1: patch = transform.resize(patch, patch_size) yield (i, j), patch indices, patches = zip(*sliding_window(test_image)) patches_hog = np.array([feature.hog(patch) for patch in patches]) patches_hog.shape

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[14], line 1
----> 1 def sliding_window(img, patch_size=positive_patches[0].shape,
      2                    istep=2, jstep=2, scale=1.0):
      3     Ni, Nj = (int(scale * s) for s in patch_size)
      4     for i in range(0, img.shape[0] - Ni, istep):

NameError: name 'positive_patches' is not defined

Finally, we can take these HOG-featured patches and use our model to evaluate whether each patch contains a face:

In [15]:

  Copied!     
 
labels = model.predict(patches_hog)
labels.sum()
labels = model.predict(patches_hog) labels.sum()

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[15], line 1
----> 1 labels = model.predict(patches_hog)
      2 labels.sum()

NameError: name 'model' is not defined

We see that out of nearly 2,000 patches, we have found 30 detections. Let's use the information we have about these patches to show where they lie on our test image, drawing them as rectangles:

In [16]:

  Copied!     
 
fig, ax = plt.subplots()
ax.imshow(test_image, cmap='gray')
ax.axis('off')

Ni, Nj = positive_patches[0].shape
indices = np.array(indices)

for i, j in indices[labels == 1]:
    ax.add_patch(plt.Rectangle((j, i), Nj, Ni, edgecolor='red',
                               alpha=0.3, lw=2, facecolor='none'))
fig, ax = plt.subplots() ax.imshow(test_image, cmap='gray') ax.axis('off') Ni, Nj = positive_patches[0].shape indices = np.array(indices) for i, j in indices[labels == 1]: ax.add_patch(plt.Rectangle((j, i), Nj, Ni, edgecolor='red', alpha=0.3, lw=2, facecolor='none'))

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[16], line 5
      2 ax.imshow(test_image, cmap='gray')
      3 ax.axis('off')
----> 5 Ni, Nj = positive_patches[0].shape
      6 indices = np.array(indices)
      8 for i, j in indices[labels == 1]:

NameError: name 'positive_patches' is not defined

All of the detected patches overlap and found the face in the image! Not bad for a few lines of Python.

Caveats and Improvements¶

If you dig a bit deeper into the preceding code and examples, you'll see that we still have a bit of work before we can claim a production-ready face detector. There are several issues with what we've done, and several improvements that could be made. In particular:

Our training set, especially for negative features, is not very complete¶

The central issue is that there are many face-like textures that are not in the training set, and so our current model is very prone to false positives. You can see this if you try out the above algorithm on the full astronaut image: the current model leads to many false detections in other regions of the image.

We might imagine addressing this by adding a wider variety of images to the negative training set, and this would probably yield some improvement. Another way to address this is to use a more directed approach, such as hard negative mining. In hard negative mining, we take a new set of images that our classifier has not seen, find all the patches representing false positives, and explicitly add them as negative instances in the training set before re-training the classifier.

Our current pipeline searches only at one scale¶

As currently written, our algorithm will miss faces that are not approximately 62×47 pixels. This can be straightforwardly addressed by using sliding windows of a variety of sizes, and re-sizing each patch using skimage.transform.resize before feeding it into the model. In fact, the sliding_window() utility used here is already built with this in mind.

We should combine overlapped detection patches¶

For a production-ready pipeline, we would prefer not to have 30 detections of the same face, but to somehow reduce overlapping groups of detections down to a single detection. This could be done via an unsupervised clustering approach (MeanShift Clustering is one good candidate for this), or via a procedural approach such as non-maximum suppression, an algorithm common in machine vision.

The pipeline should be streamlined¶

Once we address these issues, it would also be nice to create a more streamlined pipeline for ingesting training images and predicting sliding-window outputs. This is where Python as a data science tool really shines: with a bit of work, we could take our prototype code and package it with a well-designed object-oriented API that give the user the ability to use this easily. I will leave this as a proverbial "exercise for the reader".

More recent advances: Deep Learning¶

Finally, I should add that HOG and other procedural feature extraction methods for images are no longer state-of-the-art techniques. Instead, many modern object detection pipelines use variants of deep neural networks: one way to think of neural networks is that they are an estimator which determines optimal feature extraction strategies from the data, rather than relying on the intuition of the user. An intro to these deep neural net methods is conceptually (and computationally!) beyond the scope of this section, although open tools like Google's TensorFlow have recently made deep learning approaches much more accessible than they once were. As of the writing of this book, deep learning in Python is still relatively young, and so I can't yet point to any definitive resource. That said, the list of references in the following section should provide a useful place to start!

< In-Depth: Kernel Density Estimation | Contents | Further Machine Learning Resources >