Master theorem

Analysing complexity of recursive functions is not always easy, especially when the function does something fancy. Luckily, clever brains have provided a cookbook that makes determining the complexity of a recursive algorithm quite easy.

We may consider a recursive function as:
$R(N) = A * R(\frac{N}{B}) + O(N^C)$
Let me explain the symbols. R is a recursive function, that is called on the problem of size N. As it’s recursive, it calls itself A times, for a smaller problem of size N/B. In many recursive implementation, especially divide and conquer techniques, we got a “merge” step (please think about mergesort). The complexity of the merge step is expressed as the last term of the above formula.

Then we may determine three cases that will describe the complexity of the call.
1. $A = B^C$ then $O(R(N)) = N ^ C log N$
2. $A < B^C$ then $O(R(N)) = N ^ C$
3. $A > B^C$ then $O(R(N)) = N ^ {log _b{c}}$

Do you remember mergesort? Mergesort has two recursive calls, that divide the problem into two pieces. When the recursion is done, then comes an extra “merge” step, that has O(N) complexity. If we plug this information into the formula we got: A = 2, B = 2, C = 1. What leads to 2 = 2 ^ 1. We take the first case, and obtain complexity of $O(N) = N log N$.

Find the k-closest elements to a given value in a sorted array

We are given a sorted array, value V, and a number K. The value represents the number that may appear in the array, but it’s not mandatory. We are expected to find a set of size K, that contains the elements from the array, that are the closest to the value .

The solution uses the famous binary search algorithm. Firstly we find the value V itself, or the closest to V, in the array. The latter, we obtain just as a result of a regular binary search, that returns the position of the element if it exists, or the position where it would appear. Let’s call this position P. We know that the element at position P, will be included in the result set. Now we need to consider the elements on the left and the right hand side of P. As the array is sorted, the closest element to the one on the position P, will lay on the position P-1, or P+1. We choose the one that is closer, and iteratively process till we fill the set to the expected size. An example code:

unsigned int binary_search(vector<int> array, int el){
unsigned int begin = 0;
unsigned int end = array.size() - 1;
unsigned int medium;
while(begin < end){
medium = begin + (end - begin)/2;
if(array < el)begin = medium + 1;
else end = medium;
}
return medium;
}

vector<int> get_k_closest(vector<int> array, int el, unsigned int k){
unsigned int split = binary_search(array, el);
int l = split, r = split + 1;
unsigned int counter = k;
vector<int> result;
while(counter){
if(l >= 0 && r < array.size()){
// choose the element that is closer
if(abs(array[l] - el) < abs(array[r] - el))
result.push_back(array[l--]);
else result.push_back(array[r++]);
} else {
if(l >= 0)
result.push_back(array[l--]);
else if(r < array.size())
result.push_back(array[r++]);
}
counter--;
}
return result;
}


Almost sorted array

A sorted array is a great thing, as it is a base for numbers of efficient algorithms. However it may appear that you are given an array that is sorted, but not completely. Below you may find two interesting problems, that receive almost sorted array as an input.

Find an element in almost sorted array.
To be more precise, in this case we consider an array, that was sorted, but some of its elements were swapped between adjacent cells. So the number lays in a desired position, or the distance to the desired position is equal 1. As an example let’s take a look at the array [1,4,2,5,7,6].

Obviously we could make an linear scan from left to right, and obtain the result in O(N) time. However we can go better. As we could expect, we gonna use binary search. Notice, that if we are looking for element E, it might be located in one of three cells. Let P(E) represents the position of E in a sorted array, then in our case we need to consider positions P(E) – 1, P(E), P(E) + 1. Position P(E) is going to be determined, by using regular binary search. Please find an example C++ implementation below:

int find(vector<int> array, int el){
if(!array.size())return -1;
int begin = 0, end = array.size() - 1;
while(begin < end){
int medium = b + (e - b) / 2;
// check three places instead of one
if(array == el)return medium;
else if(medium > begin && array == el)return medium - 1;
else if(medium < end && array == el)return medium + 1;

// choose which way to go
if(array < el)
begin = medium + 2; // as we checked medium + 1 already
else end = medium - 2; // as we checked medium - 1 already
}
}


Sort almost sorted array.
As you may remember, insertion sort has the best performance when the input array is almost sorted. It provides O(N * K) complexity, where K is the maximum distance between elements and they desired position. When K is small, then the performance would be very good, however when K grows near to N, that we are ending in O(N ^ 2) complexity. However if we know the K, we can proceed in a similar way to smoothsort. In the first step we would build a min-heap of K first elements from the array. Then we may scan the array from left to right and extract the minimum from the heap to insert it into the proper position into the array. We also need to remember, to update the heap with the next element. An example implementation could look like follows:

void sort(vector<int> & array, unsigned int k){
priority_queue<int, vector<int>, greater<int>>
heap(array.begin(), array.begin() + k);

for(unsigned int i = 0; i < array.size(); i++){
array[i] = heap.top(); heap.pop();
if(i + k < array.size())
heap.push(array[i + k]);
}
}


This solution performs better than insertion sort, as in this case, building the heap mainly adds the new element to the end of it. The same about the access and remove operation from the top of the heap. In the other words, the heap doesn’t require many rearrangements. It happens because we process almost sorted array, and only from time to time, we need to correct the heap structure. This leads to O(N * log K) algorithm, that behaves even better than insertion sort. However it requires extra memory O(K).

Maintain the median of the stream of numbers

There is a stream of numbers, that has unknown size. We should process the stream in such a way, that at every moment we should be able to return the median of currently processed numbers.

For understanding what median is, please refer to the following wiki article. The simplest solution would be to add incoming numbers to a data structure, like c++ vector, and sort it every time, we are asked for the median. It wouldn’t be that bad, as with quicksort we could get O(N log N) complexity, where N is the size of currently read data.

However there is a better solution, that provides the median instantly, but requires some extra step while processing incoming numbers. Let’s imagine that we maintain the split into two parts in such a way that the difference between sizes of splits is not greater than 2. The left part contains smaller numbers, and the right larger. In the other words, every element from the left part is smaller than any element from the right part. How would we calculate the median now? If the left and right parts have equal size, we can just take the maximum element from the left MAXL part and minimum from the right MINR. Then the median = (MAXL + MINR)/2. If the array size differs by one, we take MINL if the left part is bigger, or MINR otherwise.

But then the important question appears: how to maintain the same size of both parts?. It’s quite simple, if the left part is bigger than the right one, by more than one element, we take the MAXL and move it to the right part. It becomes the new MINR. For the right part we proceed in the same way. I mean we move MINR to the left, that becomes the new MAXL. But how do we reculcate the new MINL, when we moved it to the right?

Here is where the data structures comes in. Let’s use a max heap to keep the left part and min heap to keep the right part. Then we get exactly what was described in the previous paragraph, without to much implementation and all the requirements are met. An example c++ code could look like:

priority_queue<int, vector<int>, less<int>> left; // left heap
priority_queue<int, vector<int>, greater<int>> right; // right heap

while(true){
int number;
cin >> number; // process the stream

// push to the proper heap
if(right.empty() || number >= right.top())
right.push(number);
else left.push(number);

// remain the size difference
if((int)left.size() - (int)right.size() > 1)
right.push(left.top()); left.pop();
else if((int)right.size() - (int)left.size() > 1){
left.push(right.top()); right.pop();

//  get the median
int median = 0;
if(left.size() == right.size())
median = (left.top() + right.top())/2;
else if (left.size() > right.size())
median = left.top();
else median = right.top();

// do with the median... something
{...}
}