Analysing complexity of recursive functions is not always easy, especially when the function does something fancy. Luckily, clever brains have provided a cookbook that makes determining the complexity of a recursive algorithm quite easy.

We may consider a recursive function as:

Let me explain the symbols. R is a recursive function, that is called on the problem of size N. As it’s recursive, it calls itself A times, for a smaller problem of size N/B. In many recursive implementation, especially divide and conquer techniques, we got a “merge” step (please think about mergesort). The complexity of the merge step is expressed as the last term of the above formula.

Then we may determine three cases that will describe the complexity of the call. 1. then 2. then 3. then

Do you remember mergesort? Mergesort has two recursive calls, that divide the problem into two pieces. When the recursion is done, then comes an extra “merge” step, that has O(N) complexity. If we plug this information into the formula we got: A = 2, B = 2, C = 1. What leads to 2 = 2 ^ 1. We take the first case, and obtain complexity of .

We are given a sorted array, value V, and a number K. The value represents the number that may appear in the array, but it’s not mandatory. We are expected to find a set of size K, that contains the elements from the array, that are the closest to the value .

The solution uses the famous binary search algorithm. Firstly we find the value V itself, or the closest to V, in the array. The latter, we obtain just as a result of a regular binary search, that returns the position of the element if it exists, or the position where it would appear. Let’s call this position P. We know that the element at position P, will be included in the result set. Now we need to consider the elements on the left and the right hand side of P. As the array is sorted, the closest element to the one on the position P, will lay on the position P-1, or P+1. We choose the one that is closer, and iteratively process till we fill the set to the expected size. An example code:

unsigned int binary_search(vector<int> array, int el){
unsigned int begin = 0;
unsigned int end = array.size() - 1;
unsigned int medium;
while(begin < end){
medium = begin + (end - begin)/2;
if(array < el)begin = medium + 1;
else end = medium;
}
return medium;
}
vector<int> get_k_closest(vector<int> array, int el, unsigned int k){
unsigned int split = binary_search(array, el);
int l = split, r = split + 1;
unsigned int counter = k;
vector<int> result;
while(counter){
if(l >= 0 && r < array.size()){
// choose the element that is closer
if(abs(array[l] - el) < abs(array[r] - el))
result.push_back(array[l--]);
else result.push_back(array[r++]);
} else {
if(l >= 0)
result.push_back(array[l--]);
else if(r < array.size())
result.push_back(array[r++]);
}
counter--;
}
return result;
}

A sorted array is a great thing, as it is a base for numbers of efficient algorithms. However it may appear that you are given an array that is sorted, but not completely. Below you may find two interesting problems, that receive almost sorted array as an input.

Find an element in almost sorted array.
To be more precise, in this case we consider an array, that was sorted, but some of its elements were swapped between adjacent cells. So the number lays in a desired position, or the distance to the desired position is equal 1. As an example let’s take a look at the array [1,4,2,5,7,6].

Obviously we could make an linear scan from left to right, and obtain the result in O(N) time. However we can go better. As we could expect, we gonna use binary search. Notice, that if we are looking for element E, it might be located in one of three cells. Let P(E) represents the position of E in a sorted array, then in our case we need to consider positions P(E) – 1, P(E), P(E) + 1. Position P(E) is going to be determined, by using regular binary search. Please find an example C++ implementation below:

int find(vector<int> array, int el){
if(!array.size())return -1;
int begin = 0, end = array.size() - 1;
while(begin < end){
int medium = b + (e - b) / 2;
// check three places instead of one
if(array == el)return medium;
else if(medium > begin && array == el)return medium - 1;
else if(medium < end && array == el)return medium + 1;
// choose which way to go
if(array < el)
begin = medium + 2; // as we checked medium + 1 already
else end = medium - 2; // as we checked medium - 1 already
}
}

Sort almost sorted array.
As you may remember, insertion sort has the best performance when the input array is almost sorted. It provides O(N * K) complexity, where K is the maximum distance between elements and they desired position. When K is small, then the performance would be very good, however when K grows near to N, that we are ending in O(N ^ 2) complexity. However if we know the K, we can proceed in a similar way to smoothsort. In the first step we would build a min-heap of K first elements from the array. Then we may scan the array from left to right and extract the minimum from the heap to insert it into the proper position into the array. We also need to remember, to update the heap with the next element. An example implementation could look like follows:

void sort(vector<int> & array, unsigned int k){
priority_queue<int, vector<int>, greater<int>>
heap(array.begin(), array.begin() + k);
for(unsigned int i = 0; i < array.size(); i++){
array[i] = heap.top(); heap.pop();
if(i + k < array.size())
heap.push(array[i + k]);
}
}

This solution performs better than insertion sort, as in this case, building the heap mainly adds the new element to the end of it. The same about the access and remove operation from the top of the heap. In the other words, the heap doesn’t require many rearrangements. It happens because we process almost sorted array, and only from time to time, we need to correct the heap structure. This leads to O(N * log K) algorithm, that behaves even better than insertion sort. However it requires extra memory O(K).

There is a stream of numbers, that has unknown size. We should process the stream in such a way, that at every moment we should be able to return the median of currently processed numbers.

For understanding what median is, please refer to the following wiki article. The simplest solution would be to add incoming numbers to a data structure, like c++ vector, and sort it every time, we are asked for the median. It wouldn’t be that bad, as with quicksort we could get O(N log N) complexity, where N is the size of currently read data.

However there is a better solution, that provides the median instantly, but requires some extra step while processing incoming numbers. Let’s imagine that we maintain the split into two parts in such a way that the difference between sizes of splits is not greater than 2. The left part contains smaller numbers, and the right larger. In the other words, every element from the left part is smaller than any element from the right part. How would we calculate the median now? If the left and right parts have equal size, we can just take the maximum element from the left MAXL part and minimum from the right MINR. Then the median = (MAXL + MINR)/2. If the array size differs by one, we take MINL if the left part is bigger, or MINR otherwise.

But then the important question appears: how to maintain the same size of both parts?. It’s quite simple, if the left part is bigger than the right one, by more than one element, we take the MAXL and move it to the right part. It becomes the new MINR. For the right part we proceed in the same way. I mean we move MINR to the left, that becomes the new MAXL. But how do we reculcate the new MINL, when we moved it to the right?

Here is where the data structures comes in. Let’s use a max heap to keep the left part and min heap to keep the right part. Then we get exactly what was described in the previous paragraph, without to much implementation and all the requirements are met. An example c++ code could look like:

priority_queue<int, vector<int>, less<int>> left; // left heap
priority_queue<int, vector<int>, greater<int>> right; // right heap
while(true){
int number;
cin >> number; // process the stream
// push to the proper heap
if(right.empty() || number >= right.top())
right.push(number);
else left.push(number);
// remain the size difference
if((int)left.size() - (int)right.size() > 1)
right.push(left.top()); left.pop();
else if((int)right.size() - (int)left.size() > 1){
left.push(right.top()); right.pop();
// get the median
int median = 0;
if(left.size() == right.size())
median = (left.top() + right.top())/2;
else if (left.size() > right.size())
median = left.top();
else median = right.top();
// do with the median... something
{...}
}

In the quicksort algorithm the most important part is the partition step. The implementation of it, is actually not difficult, however some implementation store data on the both sides of the array, instead of just one, what leads to the more complex and error prone algorithms. In this article I would like to show that for n-partition, we need n – 1 storage indexes.

In the pivot-partitioning we just move the smaller than pivot elements on the begining of the array. We can imagine it as a stack that is growing within the array size, from the bottom to the top. Secondly, to trace the top of the stack we use a variable called storagePlace, that indicates the size of the stack and the next place where to insert an element.

void partition(std::vector<int> & v){
int pivot = v[0]; // simplified
swap(v[0], v[v.size() - 1]);
unsigned int storagePlace = 0;
for(int i = 1; i < v.size() - 1; i++)
if(v[i] < pivot)swap(v[i], v[storagePlace++]);
swap(v[storagePlace], v[v.size() - 1]);
}

There is a famous problem of dutch flag described firstly by Dijkstra. Imagine that we have 3 kind of elements, let’s say negative numbers, 0s, positive numbers. We want to partition the array in the way that negative numbers go first, then 0s and finally positive. A while ago we mentioned that we can imagine pivoting as maintaining a stack within the array. We extend this idea to maintain two stacks now: one grows from the bottom to the top, and the other from the top to the bottom. The former stores negative numbers, the latter positives.

void dutch_flag_partition(std::vector<int> & v){
int top = v.size() - 1;
int bottom = 0;
for(int i = 0; i <= top; i++){
if(v[i] < 0)swap(v[bottom++], v[i]);
else if(v[i] > 0)swap(v[top--], v[i]);
}
}

In this case we might be sure that the elements between [0, bottom) are already the negative numbers, between [bottom, i) we have only 0, between (top, v.size()) we have positive elements. The uknown part remains between [i, top].

Both solution offer O(n) complexity for the problem. In the second case there is no stable solution that would have O(n) complexity. However if you need to keep it stable, take a look at this article.

We are given two arrays of integers and both of them are sorted. Find the algorithm to determine the median faster than O(N log N).

Let the arrays be called A1 and A2, and have size N and M respectively. If we would put the numbers into one array, then we would know that the median will be located at (N + M + 1)/2 if N + M is odd, otherwise we would take the ((N + M)/2 + (N + M + 1)/2)/2. However we won’t put them into one array. Instead we will browse them simultaneously.

int median(vector<int> a1, vector<int> a2){
int i1, i2;
i1 = i2 = 0;
int merged_size = a1.size() + a2.size();
int last, last_last;
last = max(a1[0], a2[0]);
last_last = min(a1[0], a2[0]);
while(i1 + i2 < ((merged_size + 1)/2)){
last_last = last;
if(i1 < a1.size() && a[i1] < a[i2]){
last = array[i1];
i1++;
}
else if(i2 < a2.size()){
last = array[i+1]
i2++;
}
}
if(merged_size % 2)
return last; // the middle element
else return (last + last_last)/2;
}

Thanks to using two extra variables we didn’t have to introduce stack or any other structure that would monitor the order of elements. Thanks to that the memory complexity of the problem is constant.

There is a huge set S of numbers of size N that can’t fit in the memory. Choose K of elements from the stream randomly, with equal probability of being selected.

Without to much introduction, let’s just say that the answer to the problem is Reservoir Sampling. An example code could look like:

std::vector<int> reservoir_sampling(Stream S; unsigned long long N, unsigned int K){
unsigned long long i = 0;
std::vector<int> reservoir;
// fill the reservoir
for(; i < K; ++i)
reservoir.push_back(S.next());
// replace if needed
for(; i < N; ++i){
int next_token = S.next();
unsigned int r = rand() % i;
if(r < K)reservoir[i] = next_token;
}
return reservoir;
}

You are given a number that might be so big that it doesn’t fit in any standard type, however the number is provided in a form of [a, b, c, d, …] where every letter stands for the bit number that is set in the byte representation, so you can read it easily. In example [1, 3, 4] represents 1101 = 13. Then you multiply the number by 3. How many bits are set to 1 in the binary representation of the result?

This is a real interview question that has been asked to a friend of mine quite recently. To figure out the solution, we need to firstly consider what the multiplication by 3 really means. Let’s take an example.
01101
x00011
--------
01101
+11010
--------
0100111

As you can see the result is the sum of the number itself and the number shifted to left by 1. The former is obvious, as it is the result of multiplying by 1. The latter is what multiplication by 2 does. Multiplication by 2^n shifts the byte left by n positions in a general case.

Coming back to our result we get [1, 3, 4] x 3 = [1, 3, 4] << 1 + [1, 3, 4] = [2, 4, 5] + [1, 3, 4]. Now we may answer the question: what bytes will be set after this addition? Recalling the binary arithmetic and we may notice that the solution lays in calculating the result of the addition of the. In simpler words we just need to implement binary addition of numbers represented in the form of [2,4,5] and [1,3,4]. The answer will be the size of the result structure.

int getNumberOfSetBits(vector<int> number)
{
// calculate the number shifted by 2
vector<int> shifted = number;
for(unsigned int i = 0; i < number.size(); i++)
shifted[i]++;
bool carry = false;
vector<int> result;
// the first element of number will surely be a part of the result
result.push_back(number[0]);
// start the addition
for(unsigned int i = 1; i < number.size(); i++)
{
if(number[i] == shifted[i-1])
{
if(carry)
{
result.push_back(number[i]);
}
carry = true;
}
else
{
result.push_back(number[i]);
result.push_back(shifted[i-1]);
carry = false;
}
}
// do not forget the carry the last one from shifted
int last = shifted.size() - 1;
if(carry)
result.push_back(shifted[last] + 1);
else result.push_back(shifted[last]);
return result.size();
}

The code could be optimized as actually the structure result is not necessary. But it keeps more visible what was the idea behind the algorithm.

The title of the post would suggest an answer that is just obvious and discourage you from reading it. Unfortunately I meet very often production code, that removes elements from a vector according to a certain predicate, that is written in a horrible way.

Problem definition

The problem starts with a vector of certain objects, for simplicity let us assume we are considering integers. Then we are given a predicate saying that we would like to clean the vector from the values meeting certain condition. In our case we could remove all even numbers, or prime numbers, or numbers with the sum of digits equal 9, or whatever your imagination brings. In the real world the problem could be more sophisticated and we could remove all the connections that were disconnected and there is no reason to maintain them further.

How it usually looks like

What programmers usually do, they iterate over the collection and check every element if it meets a certain condition. If no, the element should be removed. This way of thinking is not bad, however when it comes to implementation, the things usually go wrong. Here is what most of the developers I know usually do:

for(auto i = v.begin(); i != v.end(): i++)
{
if(predicate(*i))
{
i = v.erase(i);
}
}

Now, you could ask what is wrong with this code. I would answer with another question: what is the complexity? At first glance it might look like it is O(n), but when you think what erase actually does, it appears that it is actually O(n^2). The erase operation is O(n) costly as it has to rearrange the vector after removing elements. We are even informed about this, when we take a more careful look at the standard:
Because vectors use an array as their underlying storage, erasing elements in positions other than the vector end causes the container to relocate all the elements after the segment erased to their new positions. This is generally an inefficient operation compared to the one performed for the same operation by other kinds of sequence containers (such as list or forward_list).

To give you some intuition about what happens, let us imagine the vector that looks like following

[ 5 | 4 | 7 | 8 | 1 | 3 | 6 ]

We want to remove the element 8. When this happens, all the elements on the right hand from 8 has to be shifted left.

What is O(n) operation. Considering the outer loop we fall into O(n^2) complexity, that is not acceptable as long as we may get something better and in this case we can.

How it should look like

The STL library provides you a set of algorithms that make the operations on the containers easier. One of them is the remove_if operation, that accepts two iterators which indicate the range that should be considered, and the mentioned predicate. The remove_if approach is way more clever than the loop/erase one. L What the specification says about remove_if is:
The removal is done by replacing the elements for which pred returns true by the next element for which it does not, and signaling the new size of the shortened range by returning an iterator to the element that should be considered its new past-the-end element.

To clarify it even more, let us take a look at the example. Let us consider the vector from the previous example and see how remove_if would work when the predicate is “remove if the number is divisible by 4”.

As we can see the elements that meet the predicate have been moved to the end of the array. Secondly the remove_if returns the iterator indicating the position after “the last good element” (6 in this case), that we may use to remove the range of “bad elements”. Before we will do it, it is worth to mention that the order of “good elements” is maintained, but it is not necessary the case for “bad elements”.

Finally we could get a code that looks like follows:

v.erase(
remove_if(v.begin(), v.end(), [](const int & in){return i % 4 == 0;}),
v.end());

Yes, we still use the erase method, but this time we are removing from the end of the vector so there is no necessity of rearrangement the elements and the erase operation is O(1). In this case the complexity would be O(n), what is quite satisfactory in comparison to O(n^2).

Summary

I think that this article clearly shows how important it is, to understand well the mechanism behind Standard Template Library. It is a really powerful tool and knowing how to use it properly makes your life way easier and your programs way faster.

The article below presents the results of one of the parts of my master thesis. The purpose of the thesis was to research methods that are applicable for move tracking and gesture recognition and a try of building a working system. Firstly the idea of the study was to apply gathered knowledge for sign-language translation. Unfortunately the lack of time (4 months) and hardware limitation allowed only for proof of concept’s implementation. Although the system is good enough for human-computer interaction.

Assumptions

Unfortunately the real world is to complicated to be described well within a computer application. A common technique while implementing complex systems is to make assumptions that simplify the real problem to a simpler one. Also in this case I made some simplification that limited users behavior but allowed me to build a working system within reasonable time. Therefore the main assumption I did:
– Only one object moves at time.
– We consider only moves in 2D plane.
– Constant source of light.
– Camera doesn’t move.

The first assumption explains by itself. Just to be precise, having just one moving objects removes the necessity of deciding what is the object that actually needs to be traced, or tracing numbers of objects. The second one tells that depth is not considered. A very important assumptions was the last one. As the segmentation was colored based and I was working only with a very color sensitive web-camera I couldn’t allow on working with sun-light. Even little changes of the light angle or saturation changes the color perceived by the camera too much.

The assumptions are just some abstractions that are used mainly for simplifying the image-processing part and they could be chosen freely as long as they don’t violate the system specification. The main point of them is to skip some of the problems that are not the crucial part of the system, and concentrate on what really matters.

Segmentation

The method I decided to use for segmentation uses the colors histogram approach. Before tracing starts, there is one extra preprocessing step (you may check it on the video), where I select the area to take a sample of colors to trace. The color value appearing more often in the sample has larger value in the histogram. Then I propagate back the histogram to the image, but this time in the grayscale. The bigger value of the color, the more white it becomes on the output image (the other window on the left). It is worth mentioning that in this case of original image I used HSV model rather than RGB.

Picture 1. Click to see the movie presenting segmentation.

The second step of the segmentation is to abandon elements that are not interesting for the application. Even if we made an assumption that there is only one object moving at time, it’s almost impossible to keep your head not moving at all. It introduces some problems as it has similar color histogram as hands and may confusing the tracing algorithm. Moreover using complex head/face detection approaches using boosting approaches didn’t make sense as they are computationally expensive and I simply didn’t need them. Nevertheless the head was not the only problem. There could be some other object with matching color histogram in the area. Luckily the solution described below solved the issue.

What was done in this case, based on the observation that heads moves way slower than hands (in the best case doesn’t move at all). Thus the consecutive frames are compared and pixels that don’t change much are abandoned. The comparison is done by substracting pixels coming from different frames and comparing them to the threshold defined on the base of experiments.

Tracing

The tracing algorithm has to met some requirements that are coming directly from the nature of the moves: they don’t have to be continues and linear. This makes the tracing problem a little bit more complicated as in case of instant change of the direction, the tracing algorithm has to notice it quickly and not to loose the followed object. Another thing is that the tracing object might be occluded by some other objects. In our case this problem occurs when we move hand above head. We have eliminated this problem partially in the previous step, but it was not sufficient – we managed not to trace the head, but there were difficulties of losing the hand when it appeared over the head.

Picture 2. Click to see the movie presenting tracing.

After the SLR (Systematic Literature Review) as the first part of the thesis, I decided to apply two strategies described in the literature: Meanshift algorithm for tracing and Kalman Filter for predicting the position of the hand in the next frame.

Meanshift algorithm has been applied since 1998 in tracking problems successfully. The formal basis of the method was described by Fukunaga and Comaniciu at [1]. However the main idea might be described as follows:

1. Choose the window
a) set the initial position of the window
b) choose the density function type
c) choose the shape of the window
d) choose the size of the window.
2. Count the center of mass.
3. Move the window to the center of mass.
4. Move back to point 2.

As we can expect I associated the back-propagation method with the Meanshift algorithm – the window is moved to the point that is the center of the window, that suits the color histogram the most.

Kalman Filter was firstly described in 1960. The precise explanation of the method might be found in Welsh and Bishop [2] paper. The point of the approach is to keep the information about previous measures and maximize the a-posteriori probability of the position’s prediction. But instead of keeping a long list of previous measures we are creating a model that is iteratively updated.

Gesture representation

Picture 3. The shape recognition step. Click to watch the movie.

The representation of a gesture is a sequence of directions and shapes. A direction is represented using the idea taken from the compass rose with the eight principal winds, except the winds are vectors outgoing from the center. Every direction has assigned a number from 1 to 8 and describes the direction of the last move.

The shape is one of the predefined shapes of the hand. The system gathers the contours of the hand and uses a SVM (Support Vector Machine) classifier to obtain related label. The contours were represented using Freeman Chain Codes.

Finally a gesture could look like a sequence of pairs [1,’A’], [1, ‘B’] , [2, ‘B’], [, ‘STOP’]. Notice that when the ‘STOP’ shape is met, the direction doesn’t matter.

Recognition

Hidden Markov Model has been applied for gesture recognition. It was trained using some set of predefined actions that were then translated to a language word. Every sequence of movements was ending with a special move meaning “end of gesture”. It simplified the processing a lot, but it is not a very good practice and “don’t try it at home” ðŸ˜‰ The results might be seen on the last video.

Training

The training was split into two parts. Firstly I was learning SVM just for the shape recognition. When it was done, I approached the second step of learning the HMMs using outputs from SVMto describe the shape.

Conclusion

Even if the system I implemented was very limited, it proved the concepts of the features choice and the tracing algorithms as strong enough for a human-computer interaction. Nevertheless the segmentation method has to be improved as the current one is a home-made solution and the assumptions it takes are too strong. Secondly the language processing shouldn’t require the “stop word” and should be done fully dynamically.

References

[1]. K. Fukunaga, “Introduction to Statistical Pattern Recognition”, Boston, Academic Press, 1990
[2]. G. Welsh, G. Bishop “An introduction to Kalman filter” (Technical Report TR95-041), University of North Carolina, Chapel Hill, NC, 1995
[3]. D. Comaniciu and P. Meer, “Mean shift analysis and applications”, IEEE Internation Conference on Computer Vision (vol 2, p. 1197), 1999.