Almost sorted array

A sorted array is a great thing, as it is a base for numbers of efficient algorithms. However it may appear that you are given an array that is sorted, but not completely. Below you may find two interesting problems, that receive almost sorted array as an input.

Find an element in almost sorted array.
To be more precise, in this case we consider an array, that was sorted, but some of its elements were swapped between adjacent cells. So the number lays in a desired position, or the distance to the desired position is equal 1. As an example let’s take a look at the array [1,4,2,5,7,6].

Obviously we could make an linear scan from left to right, and obtain the result in O(N) time. However we can go better. As we could expect, we gonna use binary search. Notice, that if we are looking for element E, it might be located in one of three cells. Let P(E) represents the position of E in a sorted array, then in our case we need to consider positions P(E) – 1, P(E), P(E) + 1. Position P(E) is going to be determined, by using regular binary search. Please find an example C++ implementation below:

int find(vector<int> array, int el){
 if(!array.size())return -1;
 int begin = 0, end = array.size() - 1;
 while(begin < end){
  int medium = b + (e - b) / 2;
  // check three places instead of one
  if(array == el)return medium;
  else if(medium > begin && array == el)return medium - 1;
  else if(medium < end && array == el)return medium + 1;

  // choose which way to go
  if(array < el)
   begin = medium + 2; // as we checked medium + 1 already
  else end = medium - 2; // as we checked medium - 1 already
 }
}

Sort almost sorted array.
As you may remember, insertion sort has the best performance when the input array is almost sorted. It provides O(N * K) complexity, where K is the maximum distance between elements and they desired position. When K is small, then the performance would be very good, however when K grows near to N, that we are ending in O(N ^ 2) complexity. However if we know the K, we can proceed in a similar way to smoothsort. In the first step we would build a min-heap of K first elements from the array. Then we may scan the array from left to right and extract the minimum from the heap to insert it into the proper position into the array. We also need to remember, to update the heap with the next element. An example implementation could look like follows:

void sort(vector<int> & array, unsigned int k){
 priority_queue<int, vector<int>, greater<int>> 
  heap(array.begin(), array.begin() + k);

 for(unsigned int i = 0; i < array.size(); i++){
  array[i] = heap.top(); heap.pop();
  if(i + k < array.size())
   heap.push(array[i + k]);
 }
}

This solution performs better than insertion sort, as in this case, building the heap mainly adds the new element to the end of it. The same about the access and remove operation from the top of the heap. In the other words, the heap doesn’t require many rearrangements. It happens because we process almost sorted array, and only from time to time, we need to correct the heap structure. This leads to O(N * log K) algorithm, that behaves even better than insertion sort. However it requires extra memory O(K).

Advertisements

Maintain the median of the stream of numbers

There is a stream of numbers, that has unknown size. We should process the stream in such a way, that at every moment we should be able to return the median of currently processed numbers.

For understanding what median is, please refer to the following wiki article. The simplest solution would be to add incoming numbers to a data structure, like c++ vector, and sort it every time, we are asked for the median. It wouldn’t be that bad, as with quicksort we could get O(N log N) complexity, where N is the size of currently read data.

However there is a better solution, that provides the median instantly, but requires some extra step while processing incoming numbers. Let’s imagine that we maintain the split into two parts in such a way that the difference between sizes of splits is not greater than 2. The left part contains smaller numbers, and the right larger. In the other words, every element from the left part is smaller than any element from the right part. How would we calculate the median now? If the left and right parts have equal size, we can just take the maximum element from the left MAXL part and minimum from the right MINR. Then the median = (MAXL + MINR)/2. If the array size differs by one, we take MINL if the left part is bigger, or MINR otherwise.

But then the important question appears: how to maintain the same size of both parts?. It’s quite simple, if the left part is bigger than the right one, by more than one element, we take the MAXL and move it to the right part. It becomes the new MINR. For the right part we proceed in the same way. I mean we move MINR to the left, that becomes the new MAXL. But how do we reculcate the new MINL, when we moved it to the right?

Here is where the data structures comes in. Let’s use a max heap to keep the left part and min heap to keep the right part. Then we get exactly what was described in the previous paragraph, without to much implementation and all the requirements are met. An example c++ code could look like:

priority_queue<int, vector<int>, less<int>> left; // left heap
priority_queue<int, vector<int>, greater<int>> right; // right heap

while(true){
 int number;
 cin >> number; // process the stream

 // push to the proper heap
 if(right.empty() || number >= right.top())
   right.push(number);
  else left.push(number);

 // remain the size difference
 if((int)left.size() - (int)right.size() > 1)
  right.push(left.top()); left.pop();
 else if((int)right.size() - (int)left.size() > 1){
  left.push(right.top()); right.pop();
 
 //  get the median
 int median = 0;
 if(left.size() == right.size())
  median = (left.top() + right.top())/2;
 else if (left.size() > right.size())
  median = left.top();
 else median = right.top();

 // do with the median... something
 {...}
}

Array pivoting

In the quicksort algorithm the most important part is the partition step. The implementation of it, is actually not difficult, however some implementation store data on the both sides of the array, instead of just one, what leads to the more complex and error prone algorithms. In this article I would like to show that for n-partition, we need n – 1 storage indexes.

In the pivot-partitioning we just move the smaller than pivot elements on the begining of the array. We can imagine it as a stack that is growing within the array size, from the bottom to the top. Secondly, to trace the top of the stack we use a variable called storagePlace, that indicates the size of the stack and the next place where to insert an element.

void partition(std::vector<int> & v){
 int pivot = v[0]; // simplified
 swap(v[0], v[v.size() - 1]);
 unsigned int storagePlace = 0;
 for(int i = 1; i < v.size() - 1; i++)
  if(v[i] < pivot)swap(v[i], v[storagePlace++]); 
 swap(v[storagePlace], v[v.size() - 1]);
}

There is a famous problem of dutch flag described firstly by Dijkstra. Imagine that we have 3 kind of elements, let’s say negative numbers, 0s, positive numbers. We want to partition the array in the way that negative numbers go first, then 0s and finally positive. A while ago we mentioned that we can imagine pivoting as maintaining a stack within the array. We extend this idea to maintain two stacks now: one grows from the bottom to the top, and the other from the top to the bottom. The former stores negative numbers, the latter positives.

void dutch_flag_partition(std::vector<int> & v){
 int top = v.size() - 1;
 int bottom = 0;
 for(int i = 0; i <= top; i++){
  if(v[i] < 0)swap(v[bottom++], v[i]);
  else if(v[i] > 0)swap(v[top--], v[i]);
 }
}

In this case we might be sure that the elements between [0, bottom) are already the negative numbers, between [bottom, i) we have only 0, between (top, v.size()) we have positive elements. The uknown part remains between [i, top].

Both solution offer O(n) complexity for the problem. In the second case there is no stable solution that would have O(n) complexity. However if you need to keep it stable, take a look at this article.

Median of two sorted arrays

We are given two arrays of integers and both of them are sorted. Find the algorithm to determine the median faster than O(N log N).

Let the arrays be called A1 and A2, and have size N and M respectively. If we would put the numbers into one array, then we would know that the median will be located at (N + M + 1)/2 if N + M is odd, otherwise we would take the ((N + M)/2 + (N + M + 1)/2)/2. However we won’t put them into one array. Instead we will browse them simultaneously.

int median(vector<int> a1, vector<int> a2){
 int i1, i2;
 i1 = i2 = 0;
 int merged_size = a1.size() + a2.size();
 int last, last_last;
 last = max(a1[0], a2[0]);
 last_last = min(a1[0], a2[0]);

 while(i1 + i2 < ((merged_size + 1)/2)){
  last_last = last;
  if(i1 < a1.size() && a[i1] < a[i2]){
   last = array[i1];
   i1++;
  }
  else if(i2 < a2.size()){
   last = array[i+1]
   i2++;
  }
 }
 if(merged_size % 2)
  return last; // the middle element
 else return (last + last_last)/2;
}

Thanks to using two extra variables we didn’t have to introduce stack or any other structure that would monitor the order of elements. Thanks to that the memory complexity of the problem is constant.

Find the number of set bits in the result of multiplication of big numbers

You are given a number that might be so big that it doesn’t fit in any standard type, however the number is provided in a form of [a, b, c, d, …] where every letter stands for the bit number that is set in the byte representation, so you can read it easily. In example [1, 3, 4] represents 1101 = 13. Then you multiply the number by 3. How many bits are set to 1 in the binary representation of the result?

This is a real interview question that has been asked to a friend of mine quite recently. To figure out the solution, we need to firstly consider what the multiplication by 3 really means. Let’s take an example.

 01101
x00011
--------
 01101
+11010
--------
0100111

As you can see the result is the sum of the number itself and the number shifted to left by 1. The former is obvious, as it is the result of multiplying by 1. The latter is what multiplication by 2 does. Multiplication by 2^n shifts the byte left by n positions in a general case.

Coming back to our result we get [1, 3, 4] x 3 = [1, 3, 4] << 1 + [1, 3, 4] = [2, 4, 5] + [1, 3, 4]. Now we may answer the question: what bytes will be set after this addition? Recalling the binary arithmetic and we may notice that the solution lays in calculating the result of the addition of the. In simpler words we just need to implement binary addition of numbers represented in the form of [2,4,5] and [1,3,4]. The answer will be the size of the result structure.

int getNumberOfSetBits(vector<int> number)
{
        // calculate the number shifted by 2
        vector<int> shifted = number;
        for(unsigned int i = 0; i < number.size(); i++)
                shifted[i]++;

        bool carry = false;
        vector<int> result;

        // the first element of number will surely be a part of the result
        result.push_back(number[0]);

        // start the addition 
        for(unsigned int i = 1; i < number.size(); i++)
        {
                if(number[i] == shifted[i-1])
                {
                        if(carry)
                        {
                                result.push_back(number[i]);
                        }
                        carry = true;
                }
                else
                {
                        result.push_back(number[i]);
                        result.push_back(shifted[i-1]);
                        carry = false;
                }
        }

        // do not forget the carry the last one from shifted
        int last = shifted.size() - 1;
        if(carry)
                result.push_back(shifted[last] + 1);
        else result.push_back(shifted[last]);

        return result.size(); 
}

The code could be optimized as actually the structure result is not necessary. But it keeps more visible what was the idea behind the algorithm.

Regards

Joe Kidd