Remove elements from std::vector

The title of the post would suggest an answer that is just obvious and discourage you from reading it. Unfortunately I meet very often production code, that removes elements from a vector according to a certain predicate, that is written in a horrible way.

Problem definition

The problem starts with a vector of certain objects, for simplicity let us assume we are considering integers. Then we are given a predicate saying that we would like to clean the vector from the values meeting certain condition. In our case we could remove all even numbers, or prime numbers, or numbers with the sum of digits equal 9, or whatever your imagination brings. In the real world the problem could be more sophisticated and we could remove all the connections that were disconnected and there is no reason to maintain them further.

How it usually looks like

What programmers usually do, they iterate over the collection and check every element if it meets a certain condition. If no, the element should be removed. This way of thinking is not bad, however when it comes to implementation, the things usually go wrong. Here is what most of the developers I know usually do:

for(auto i = v.begin(); i != v.end(): i++)
{
   if(predicate(*i))
   {
      i = v.erase(i);
   }
}

Now, you could ask what is wrong with this code. I would answer with another question: what is the complexity? At first glance it might look like it is O(n), but when you think what erase actually does, it appears that it is actually O(n^2). The erase operation is O(n) costly as it has to rearrange the vector after removing elements. We are even informed about this, when we take a more careful look at the standard:

Because vectors use an array as their underlying storage, erasing elements in positions other than the vector end causes the container to relocate all the elements after the segment erased to their new positions. This is generally an inefficient operation compared to the one performed for the same operation by other kinds of sequence containers (such as list or forward_list).

To give you some intuition about what happens, let us imagine the vector that looks like following

[ 5 | 4 | 7 | 8 | 1 | 3 | 6 ] 

We want to remove the element 8. When this happens, all the elements on the right hand from 8 has to be shifted left.

[ 5 | 4 | 7 | x | 1 | 3 | 6 ] 
[ 5 | 4 | 7 | 1 | x | 3 | 6 ] 
[ 5 | 4 | 7 | 1 | 3 | x | 6 ] 
[ 5 | 4 | 7 | 1 | 3 | 6] 

What is O(n) operation. Considering the outer loop we fall into O(n^2) complexity, that is not acceptable as long as we may get something better and in this case we can.

How it should look like

The STL library provides you a set of algorithms that make the operations on the containers easier. One of them is the remove_if operation, that accepts two iterators which indicate the range that should be considered, and the mentioned predicate. The remove_if approach is way more clever than the loop/erase one. L What the specification says about remove_if is:

The removal is done by replacing the elements for which pred returns true by the next element for which it does not, and signaling the new size of the shortened range by returning an iterator to the element that should be considered its new past-the-end element.

To clarify it even more, let us take a look at the example. Let us consider the vector from the previous example and see how remove_if would work when the predicate is “remove if the number is divisible by 4”.

[ 5 | 4 | 7 | 8 | 1 | 3 | 6 ] 
[ 5 | 7 | 1 | 3 | 6 | 4 | 8 ] 
-------------------- xxxxxxxx
       good            bad

As we can see the elements that meet the predicate have been moved to the end of the array. Secondly the remove_if returns the iterator indicating the position after “the last good element” (6 in this case), that we may use to remove the range of “bad elements”. Before we will do it, it is worth to mention that the order of “good elements” is maintained, but it is not necessary the case for “bad elements”.

Finally we could get a code that looks like follows:

v.erase(
   remove_if(v.begin(), v.end(), [](const int & in){return i % 4 == 0;}),
   v.end());

Yes, we still use the erase method, but this time we are removing from the end of the vector so there is no necessity of rearrangement the elements and the erase operation is O(1). In this case the complexity would be O(n), what is quite satisfactory in comparison to O(n^2).

Summary

I think that this article clearly shows how important it is, to understand well the mechanism behind Standard Template Library. It is a really powerful tool and knowing how to use it properly makes your life way easier and your programs way faster.

Gesture Recognition System

Introduction

gesture
The article below presents the results of one of the parts of my master thesis. The purpose of the thesis was to research methods that are applicable for move tracking and gesture recognition and a try of building a working system. Firstly the idea of the study was to apply gathered knowledge for sign-language translation. Unfortunately the lack of time (4 months) and hardware limitation allowed only for proof of concept’s implementation. Although the system is good enough for human-computer interaction.

Assumptions

Unfortunately the real world is to complicated to be described well within a computer application. A common technique while implementing complex systems is to make assumptions that simplify the real problem to a simpler one. Also in this case I made some simplification that limited users behavior but allowed me to build a working system within reasonable time. Therefore the main assumption I did:
– Only one object moves at time.
– We consider only moves in 2D plane.
– Constant source of light.
– Camera doesn’t move.

The first assumption explains by itself. Just to be precise, having just one moving objects removes the necessity of deciding what is the object that actually needs to be traced, or tracing numbers of objects. The second one tells that depth is not considered. A very important assumptions was the last one. As the segmentation was colored based and I was working only with a very color sensitive web-camera I couldn’t allow on working with sun-light. Even little changes of the light angle or saturation changes the color perceived by the camera too much.

The assumptions are just some abstractions that are used mainly for simplifying the image-processing part and they could be chosen freely as long as they don’t violate the system specification. The main point of them is to skip some of the problems that are not the crucial part of the system, and concentrate on what really matters.

Segmentation

The method I decided to use for segmentation uses the colors histogram approach. Before tracing starts, there is one extra preprocessing step (you may check it on the video), where I select the area to take a sample of colors to trace. The color value appearing more often in the sample has larger value in the histogram. Then I propagate back the histogram to the image, but this time in the grayscale. The bigger value of the color, the more white it becomes on the output image (the other window on the left). It is worth mentioning that in this case of original image I used HSV model rather than RGB.

Picture 1. Click to see the movie presenting segmentation.

Picture 1. Click to see the movie presenting segmentation.

The second step of the segmentation is to abandon elements that are not interesting for the application. Even if we made an assumption that there is only one object moving at time, it’s almost impossible to keep your head not moving at all. It introduces some problems as it has similar color histogram as hands and may confusing the tracing algorithm. Moreover using complex head/face detection approaches using boosting approaches didn’t make sense as they are computationally expensive and I simply didn’t need them. Nevertheless the head was not the only problem. There could be some other object with matching color histogram in the area. Luckily the solution described below solved the issue.

What was done in this case, based on the observation that heads moves way slower than hands (in the best case doesn’t move at all). Thus the consecutive frames are compared and pixels that don’t change much are abandoned. The comparison is done by substracting pixels coming from different frames and comparing them to the threshold defined on the base of experiments.

Tracing

The tracing algorithm has to met some requirements that are coming directly from the nature of the moves: they don’t have to be continues and linear. This makes the tracing problem a little bit more complicated as in case of instant change of the direction, the tracing algorithm has to notice it quickly and not to loose the followed object. Another thing is that the tracing object might be occluded by some other objects. In our case this problem occurs when we move hand above head. We have eliminated this problem partially in the previous step, but it was not sufficient – we managed not to trace the head, but there were difficulties of losing the hand when it appeared over the head.

Picture 2. Click to see the movie presenting tracing.

Picture 2. Click to see the movie presenting tracing.

After the SLR (Systematic Literature Review) as the first part of the thesis, I decided to apply two strategies described in the literature: Meanshift algorithm for tracing and Kalman Filter for predicting the position of the hand in the next frame.

Meanshift algorithm has been applied since 1998 in tracking problems successfully. The formal basis of the method was described by Fukunaga and Comaniciu at [1]. However the main idea might be described as follows:

1. Choose the window
a) set the initial position of the window
b) choose the density function type
c) choose the shape of the window
d) choose the size of the window.
2. Count the center of mass.
3. Move the window to the center of mass.
4. Move back to point 2.

As we can expect I associated the back-propagation method with the Meanshift algorithm – the window is moved to the point that is the center of the window, that suits the color histogram the most.

Kalman Filter was firstly described in 1960. The precise explanation of the method might be found in Welsh and Bishop [2] paper. The point of the approach is to keep the information about previous measures and maximize the a-posteriori probability of the position’s prediction. But instead of keeping a long list of previous measures we are creating a model that is iteratively updated.

Gesture representation

Picture 3. The shape recognition step. Click to watch the movie.

Picture 3. The shape recognition step. Click to watch the movie.

The representation of a gesture is a sequence of directions and shapes. A direction is represented using the idea taken from the compass rose with the eight principal winds, except the winds are vectors outgoing from the center. Every direction has assigned a number from 1 to 8 and describes the direction of the last move.

The shape is one of the predefined shapes of the hand. The system gathers the contours of the hand and uses a SVM (Support Vector Machine) classifier to obtain related label. The contours were represented using Freeman Chain Codes.

Finally a gesture could look like a sequence of pairs [1,’A’], [1, ‘B’] , [2, ‘B’], [, ‘STOP’]. Notice that when the ‘STOP’ shape is met, the direction doesn’t matter.

Recognition

Hidden Markov Model has been applied for gesture recognition. It was trained using some set of predefined actions that were then translated to a language word. Every sequence of movements was ending with a special move meaning “end of gesture”. It simplified the processing a lot, but it is not a very good practice and “don’t try it at home” 😉 The results might be seen on the last video.

Training

The training was split into two parts. Firstly I was learning SVM just for the shape recognition. When it was done, I approached the second step of learning the HMMs using outputs from SVMto describe the shape.

Conclusion

Even if the system I implemented was very limited, it proved the concepts of the features choice and the tracing algorithms as strong enough for a human-computer interaction. Nevertheless the segmentation method has to be improved as the current one is a home-made solution and the assumptions it takes are too strong. Secondly the language processing shouldn’t require the “stop word” and should be done fully dynamically.

References

[1]. K. Fukunaga, “Introduction to Statistical Pattern Recognition”, Boston, Academic Press, 1990
[2]. G. Welsh, G. Bishop “An introduction to Kalman filter” (Technical Report TR95-041), University of North Carolina, Chapel Hill, NC, 1995
[3]. D. Comaniciu and P. Meer, “Mean shift analysis and applications”, IEEE Internation Conference on Computer Vision (vol 2, p. 1197), 1999.