Boosting color-based tracking

Tracking an object over a video sequence is a classic problem in Computer Vision. A tracking algorithm looks for the object at every new frame, given a model of the object. This model holds information about the object, regarding how it looks like, how it moves, or whatever particular feature that can help spotting it. A very simple model, yet efficient for many problems, is the object's color histogram. The histogram represents the probability distribution of colors on the object. Therefore, a color with a high histogram value is likely to be found on the object and vice versa.

There are tons of tutorials on color-based tracking that use the histogram back-projection to find those pixels in the image that are most likely to be object pixels. Then, they use MeanShift or CAMShift (Continuously Adaptive MeanShift) to track the object region. The histogram back-projection generates a new image, in which every pixel contains the value of the histogram for the color of the corresponding pixel in the input image (Fig. 1). Intuitively, the idea is simple: if there is a red pixel and a blue pixel in the image and my object has a lot of red pixels but a few blue pixels, the red pixel has a higher probability of belonging to the object than the blue pixel. While this is a fair approximation, it would make Bayes turn in his grave. Taking the problem from a probabilistic view explains why.

Fig. 1: Original image (left) and normalized histogram back-projection (right). Object model: 64x64-bin histogram of the hue and saturation channels of the area outlined in green.

Probabilistic formulation

After normalization, each bin value in the color histogram expresses the probability that a pixel has a color in bin B, given that it is a pixel of object O. We can express this as the conditional probability p(cx,y ⊂ B | (x, y) ⊂ O). The distribution for this can be abbreviated as P(B | O). However, we are looking for the opposite: the probability that a pixel in the image belongs to object O, provided that it has a color in bin B, or P(O | B). Because of the Bayes theorem:

P(O | B) · P(B) = P(B | O) · P(O)

P(O | B) =
 P(B | O) · P(O)
P(B)

Since each pixel (x, y) in the histogram back-projection has a value of P(B | O), by directly taking this as a likelihood measure of finding the object at that pixel, we are assuming that P(O | B) = P(B | O) and, thus, we are neglecting P(O) and P(C). Why does it work, then? P(B | O) usually works as an approximation of P(O | B) because:

a) The object can usually be found at any position in the image and, so, it is acceptable to assume that P(O) is uniform, i.e. p(O) is the same for all pixels.

b) It is assumed that all colors can be found in the scene with equal probability and, therefore, P(B) is uniform, too.

If these conditions hold true, we can forget about P(O) / P(B) because it will be a constant that will not change the location of the P(O | B) extrema. This approximation is used in most tutorials about color-based tracking with MeanShift or CAMShift and it produces problems that can be avoided sometimes.

The problem of assuming that P(B) is uniform

Let's assume that our target object is mainly blue, with black and yellow details. If it is in front of a black background, we will be able to track it perfectly (Fig. 2). Background pixels will have a low object probability because color black will have low importance in our object model.

Fig. 2: Original sequence (left) with a blue object with black and yellow details. The object is tracked with MeanShift and the tracking window is depicted as a white rectangle. The right part is the sequence of back-projections using the object histogram.

However, if some background part is blue, the tracking will be lost when the object passes over it (Fig. 3). That part of the background will have a high probability, since it has a color that is mainly found over the target object. No matter that the object has other colors that are nowhere else seen, like the yellow color.

Fig. 3: The tracking is lost when the object passes over the blue background (left) because the background blue part is assigned a high probability in the back-projection image (right).

Enhancing the color model

The object in Fig. 3 is lost because the background has similar colors. However, we could benefit from the fact that the object has colors that are very specific to it. Despite that P(B | O) is low for these colors, P(B) will also be low for them because they are rarely found in the whole scene. Therefore, taking into account P(B) when computing P(O | B) will boost the importance of these rare colors within the object model and the tracking performance will increase (Fig. 4).

Fig. 4: The rare yellow feature of the object gets higher importance in the model, which keeps the tracking from failing.

Feel free to check the source code to see how the tracking in Fig. 4 was done. OpenCV was used to compute the object histogram and run MeanShift as usual. In addition, the cumulative histogram of the whole scene over time is computed to find the prior distribution of the color bins. Each bin in the object's color histogram is divided by the corresponding bin in the global scene histogram, so the probability map for the object is estimated more accurately. The following code fragment is where the magic happens (OpenCV is used):

```Mat objectHistogram; Mat globalHistogram; void getObjectHistogram(Mat frame) {     const int channels[] = { 0, 1 };     const int histSize[] = { 64, 64 };     float range[] = { 0, 256 };     const float *ranges[] = { range, range };     // Histogram in object region     Mat objectROI = frame(Rect(objectPosition, Size(objectSize.x, objectSize.y)));     calcHist(&objectROI, 1, channels, noArray(), objectHistogram, 2, histSize, ranges, true, false);     // A priori color distribution with cumulative histogram     calcHist(&frame, 1, channels, noArray(), globalHistogram, 2, histSize, ranges, true, true);     // Boosting: Divide conditional probabilities in object area by a priori probabilities of colors     for (int y = 0; y < objectHistogram.rows; y++) {         for (int x = 0; x < objectHistogram.cols; x++) {             objectHistogram.at<float>(y, x) /= globalHistogram.at<float>(y, x);         }     }     normalize(objectHistogram, objectHistogram, 0, 255, NORM_MINMAX); }```