K-way merge

There is a certain number if sorted arrays and each of them contains numbers. We want to take an element from every array in such a way, that the difference between the elements is the lowest.

For simplicity let’s say N=3 and the arrays are like:

A = {4, 10, 15, 20}
B = {1, 13, 29}
C = {5, 14, 28}

Now we need to choose one element from each of array, let’s call them a, b, c, such that |a – b| + |b – c| + |c – a| is minimal. For this case the solution is a = 15, b = 13, c = 14. But how to find it?

For solving the problem we are going to use minimal heap data structure with constant size. In our case the size of the heap is equal to number of arrays we process, so the size = 3. In the first step we take the minimal element from the array, and we store the difference of its sum. Now we are going to remove the element from the top of the heap and replace it with the next element from the array the removed element comes from. Inserting the a new element into the heap will rearrange the elements so that the minimal one will be always on the top. We store again the new difference if it’s smaller than the previous one and we repeat until one of the arrays becomes empty. Let’s take a look at an example:

H = {1, 4, 5}, diff = 8
we remove 1 and put 13 as 1 belonged to the same array
H = {4, 5, 13}, diff = 18
we remove 4 and put 10 as 4 belonged to the same array
H = {5, 10, 13}, diff = 16
we remove 5 and put 14 as 5 belonged to the same array
H = {10, 13, 14}, diff = 8
we remove 10 and we put 15 as 10 belonged to the same array
H = {13, 14, 15}, diff = 4
...

I was inspired to describe this solution, that is called k-way merged, after meeting several questions related to it on careercup – but without a very easy explanation. I hope it helps a bit. Actually it helps at least me 🙂

Best Regards

Joe Kidd

Advertisements

Find the missing number of an arithmetic progression

We are given an array representing the elements of an increasing arithmetic progression. The arithmetic progression might be described by a formula a_n = a_1 + (n - 1) r , that means that n-th element of it, is a sum of the first element and (n – 1) steps r. We are expected to find the missing number in the array taking an assumption that the first and last element are never missing.

To make the problem more understandable let’s take a following example of an array: [1 3 5 7 9 11 15 17]. W can see that the a_1 = 1, and the r might be calculated by making use of the assumption that the first at the last element are never missing r = \frac{(a_n - a_1)} {(n - 1) } = (17 - 1)/8 = 2. We can just check by taking a deeper look at the table that the missing element is 14. Actually we can just scan the table from left to right and compare consecutive elements and stop when the difference between them is different than expected. It leads to O(n) solution. But it smells like we can go better.

As the progression is increasing and we know the precise formula for the element at n-th position we may apply the famous binary search. We just use the fact that if at the n-th position the element is bigger than expected, then the missing number has to be on the left side of the n-th, otherwise the missing number has to be on the right side. However before approaching the previous test there is a necessity to check the current element and the previous one, as the missing element may lay just between them.

The pseudocode could look like follows:

if array[n/2] - array[n/2 - 1] > r then
// we got a missing element that is array[n/2] - r
if array[n/2] > array[begin] + (n/2 - 1)r
// then we we search left part of the array
if array[n/2] = array[begin] + (n/2 - 1)r
// then consider right part of the array

We could code it as:

int findMissingElement(int array[], int begin, int end, int r)
{
    int m = (end - begin)/2;
    if(array[m] - array[m - 1] != r)
        return array[m] - r;
    if(array[m] > array[begin] + (m - 1) * r)
    {
        return findMissingElement(array, begin, m - 1, r);
    }
    else if(array[m] == array[begin] + (m - 1) * r)
    {
        return findMissingElement(array, m + 1, end, r);
    }
}

The complexity is obviously O(log N).

Best Regards

Joe Kidd

Longest Arithmetic Progression

Input: A sorted array of numbers.
Output: The length of the longest arithmetic progression
Problem: We are given a sorted array and we need to find the length of the longest arithmetic progression..
Example: [1, 9, 10, 11, 16, 21, 28, 30, 31], answer=4.
Complexity: Time O(n^2), Memory 0(n^2)

Before reading this article, it’s strongly recommended to take a look at this one, as it describes the basic idea we gonna use in the solution.

The solution will base on a dynamic programming approach, as the problem has a property of optimal substructure – we can build arithmetic progression of length N if we already know one with lenght (N-1).

Let’s consider three indices L, M, R, L < M < R, and let’s use the property describing arithmetic sequence mentioned in the previous article. If we know three indices L,M,R that are in a arithmetic progression, and we know that M, R, Q indices are also in the aritmetic progression, then L,M,R,Q are aslo in arithmetic progression. We will use this property to create dynamic programming solution.

Another thing worth considering is that every pair of elements is a arithmetic progression already, what gives us initialization property of an array.

We will have A[L][M] array, where we will store the longest progression length, starting with L, M. We will process the progression from right to left, trying to find L and R for a fixed value of M. The code may look like this:

int longestProgression(vector<int> iProgression)
{
    int A[iProgression.size()][iProgression.size()];
    int maxLen = 2;
    // initialize last columnt values
    for(int i = 0; i < iProgression.size(); i++)
        for(int j = 0; j < iProgression.size(); j++)
            A[i][j] = 2;

    for(int M = iProgression.size() - 1; M > 0; M--)
    {
       int L = M - 1;
       int R = M + 1;
       while(L >= 0 && R < iProgression.size())
       {
           int LR = iProgression[L] + iProgression[R];
           int M2 = IProgression[M] * 2;
           if(LR > M2)
           {
               L--;
           }
           else if(LR < M2)
           {
               R++;
           }
           else
           {
               A[L][M] = A[M][R] + 1;
               maxLen = max(A[L][M], maxLen);
               L--; R++;
           }
       }
    }
    return maxLen;
}

The complexity of the solution is O(n^2) and the memory is O(n^2) too.

Find 3-elements long arithmetic progression in a sorted array

So we are given an array that is sorted in ascending order, and we are expected to find 3-elements that are arithmetic progression. As the O(n^3) solution is trivial we are expected to find a better one. Obviously the elements are spread in the array, but the order is maintained.

As an example we may consider the array [1, 9, 10, 11, 16, 21, 30] that has the arithmetic progression [1,11,13]. The trick that is going to lead to the O(n^2) solution bases on the property of an arithmetic progression, that says: 3 elements L, M, R are arithmetic progression when L < M < R. and 2 * M = L + R.

Cool! As the array is sorted L lays on the left hand side of M, and R on the right hand side. So for a fixed M, we need to find L and R meeting the condition above. We may firstly fix L = M – 1, and R = M + 1 and the iterate updating the values of L and R according to the following rule:

if array[L] + array[P] > 2 * array[M] then L--
if array[L] + array[P] < 2 * array[M] then P++
if array[L] + array[P] == 2 * array[M] then we have the answer

What leads us to a implementation of O(n^2).

bool hasArithmeticProg(vector<int> array)
{
    if(array.size() < 3)
        return false;
    
    for(int M = 1; M < array.size() - 1; M++)
    {
        int L = M - 1;
        int R = M + 1;
        while(L >= 0 && R < array.size())
        {
            if(array[L] + array[R] == 2 * array[M])
                return true;
            else if(array[L] + array[R] > 2 * array[M])
                L--;
            else R++;
        }
    }
    return false;
}

The problem is not really about finding the arithmetic progression, is rather about efficient way of finding if there are two elements in a sorted array that sum up to a given number. The arithmetic progression is just an extra thing to make it more funny.

Best Regards

Joe Kidd

The length of the longest sequence of 0s in the bit representation

We are given an integer value and we are expected to provide the information about the length of the longest sequence of 0s in its bits representation, but the solution should be not worst than O(log N). In example 167 = 10100111, the answer is 2.

O(log N) for this problem? It’s gonna be freaking tricky outstanding solutions that moves the limits of algorithms up to the boundaries and even a step behind! Let’s take a look at the code firstly and then we will explain what are we doing.

int maxZeroLen(int N)
{
int maxLen = 0;
int curLen = 0;
while(N)
{
int remainder = N % 2;
N = N >> 1;
if(!remainder)
curLen++;
else curLen = 0;
maxLen = max(maxLen, curLen);
}
return maxLen;
}

Yes it is that simple. Sorry if you expected some magic here. When we divide by 2, we actually decrease the size of the problem in the logarithmic fashion 1/2, 1/4, 1/8… etc. So the solution is O(log N). I really like this problem as it makes you think and explains what logarithmic complexity really is.

Best Regards

Joe Kidd

Check if a number is a power of 4

Input: A number N
Output: True if N is a power of 4, False otherwise
Problem: We are given a number, and we need to check if it’s a power of 4
Example: 64, result = True
Complexity: Time O(log log n), Memory 0(1)

This problem is very trivial and basically we could solve it, in a iterative way, just checking consecutives powers of four and compare them against N. But if it were that simple, it wouldn’t appear here. Moreover we have a complexity limit that is O(log log n). Checking every power of 4 would be efficient, especially if we would use bit shifting instead of normal power, but not efficient enough, as we would get O(log n) solution.

Before we will discuss the real solution let’s take a look at the binary representation of powers of 4.

4^0 = 0000001
4^1 = 0000100
4^2 = 0010000
4^3 = 1000000

As we can see the bit is set only at the odd positions and only once. How can check this condition in a effective way? We can use a powerful binary search again.

1. We divide the number bit representation into two pieces.
2. Only one piece might be not zero, if this condition is not met, the number is not a power of 4.
3. Choose a non zero part and go to step 1. as long as you find number 1, then go to step 4.
4. Check if the bit is on the odd position, if not return false, as the number is not a power of 4 otherwise go to step 5.
5. Flip the bit from the step 4, if the number euqals zero now, it is a power of 4 , otherwise it’s not.

This algorithm would give us the expected O(log log n) complexity.

There is also another solution, that is way easier to implement and works in O(1) time. As we can see from the example above, the bits are set on the odd positions. But let’s go in a opposite way and think about even positions. We know, that no even position might be set. Secondly we know that power of 4 is also a power of 2. If this condition are met, then we have a power of 4 number.


bool isPowerOf4(int N)
{
int mask = 0xAAAAAAAA; // mask with set values at the even position
return (N & mask == 0) && (N & (N - 1) == 0);
}

This solution is prettier, but I like more the one with binary search ans it shows an interesting application of this algorithm.

Best Regards

Joe Kidd

One-sided (meta) binary search

The concept described in this article changes a little bit the way we look at the binary search. I have met also a different name of this approach on topcoder pages, where it’s called meta-binary search. The second time I met this algorithm in the Skiena’s book “Algorithms Manual Design”, where one-sided binary search is used to describe it.

Honestly, I prefer the Skiena’s name, as it seems to be more accurate and describe what really happens. The classical binary search algorithm operates on a sub-range of values, like implemented on the related wiki page. As you can see, there is an upper and lower boundary provided, defining the sub-range of the array to be searched. Instead of considering the boundaries and finding the mid point, we can actually consider only the mid point – and this is what the meta binary search does. As an example is better than precept, let’s take a look at some.

Let’s say we have a sorted array and we need to find a given value. Just not to create messy examples, let’s consider the simplest one we can imagine: [1,2,3,4,5,6,7]. If we look for a number six, it will be find at the position 5 (0-indexed array).

In the first step we need to find a value of log N, that will be useful for comparing elements and choosing the proper part of the array to analyse. Of course log is 2-based. We can do it using simple iteration, like this:

int getLogOf(int N)
{
    int log2 = 0
    while((1 << log2) <= N)log2++;
    return --log2;
}

What this value really does, it is explained in the listening below, but in the “human being” words, we may define it as a distance to the next mid element.

Then we can run the search itself, using the following piece of code:

int find(int array[], int N, int K)
{
    int log2 = getLogOf(N);
    int currentPosition = 0;
    while(log2 >= 0)
    {
       // check if we got a solution
       // if yes, the return it
       if(array[curPosition] == K)
           return curPosition;
       // check the next postion
       int newPosition = curPosition + (1 << log2);
       // if it's smaller or equal then we have to
       // update the current position
       if(newPosition < N && array[newPosition] <= K)
           curPosition = newPosition;
       // decrement the skip size
       log2--;
    }
return -1;
}

So we have an array [1,2,3,4,5,6,7], and we look for 6. Because the size of the array is 7, we know that the log base 2 of 7 is 2 (2^2 = 4, but 2^3 = 8, so we take 2 as the less evil). Let’s see the iterations, the bold element indicates currently considered point:

Iter1: [1,2,3,4,5,6,7]
  log=2, currentPosition = 0, newPosition = 0 + 2 ^ 2 = 4, array[newPosition] = 5 < 6
  currentPosition = newPosition
  log--
Iter2: [1,2,3,4,5,6,7]
  log=1, currentPosition = 4, newPosition = 4 + 2 ^ 1 = 6, array[newPosition] = 7 > 6
  log--
Iter2: [1,2,3,4,5,6,7]
  log=1, currentPosition = 4, newPosition= 4 + 2 ^ 0 = 5, array[newPosition] = 6 == 6
  return 5

I hope this article makes you a little bit more familiar with the concept of one sided binary search, that is a very handy tool – check the top coder lowest common ancestor article to know why!

Best Regards

Joe Kidd

Find the K-closest point to the origin

There is an unsorted array containing the coordinates of points in a 2D space and a positive number k. We are ask to find the the k-closest point to the origin, it is to the point with coordinates (0, 0). In example for an array [(5,5), (-4,3), (3,2), (1,0), (6, -5)], and k = 2, the answer would be (3, 2).

The first approach, we could use, is sorting. I mean firstly storing the distance to the origin in an array by extending the point structure, and then sorting it according to the newly added field. After that, we could find the answer in the k-th cell of the array. This solution would O(n + n log n) and unfortunately it’s not optimal. There is O(n) solution.

The problem is a special case of the quickselect algorithm, that uses the same partition procedure, as quicksort. But before we can apply the quickselect approach, we need to do some extra pre-processing related to distance calculation.

Let’s suppose the structure of our Point, except coordinates x, y, contains also an extra value called distance. So we get a code like this:

struct Point
{
  double x, y;
  double distance;
};

Now, in the first step of the algorithm, we need to calculate the distance from the origin to each point contained in the array.

void setDistanceFromOrigin(Point points[], int size)
{
    for(int i = 0; i < size; i++)
        points[i].distance = sqrt(points[i].x * points[i].x +
                    points[i].y * points[i].y);
}

Then we can apply the quickselect algorithm for finding the k-smallest element, using the calculated distance for comparison. So far it sounds like quicksort. But in this case we are not going to sort the whole array, but only the part, where the element of interests lays. And actually sorting is a little bit too strong word here. We will pivot the array around the mid point so that lower elements will be located on the left, and larger on the right hand side of the pivot. Then we will consider only the subarray where the k-th element is located. However the partition looks like in the quicksort algorithm:

int partition(vector<Point> & points, int l, int r, int m)
{  
   double pivot = points[m].distance;
   swap(points[m], points[r]); m = r; r--;

   while(l < r)
   {
      if(points[l].distance >= pivot){
         swap(points[l], points[r--]);
      }
      else l++;
   }
   swap(points[l], points[m]); m = l;
   return l;
}

The difference between quickselect and quicksort lays in the calling procedure, that in case of quicksort is called for both parts of the array, but in case of quickselect only for the part were the elements of interest is located:

Point & quickselect(vector<Point> points, int l, int r, int k)
{
    int m = (l + r) / 2;
    m = partition(points, medium);
    if(m == k)
         return points[m];
    if(pivot < K) 
        return findKClosestPoint(points, m + 1, r, k);
    else 
        return findKClosestPoint(points, l, m - 1, k);
}

This algorithm has O(n) complexity and doesn’t require extra space, so it’s quite efficient.

Best Regards

Joe Kidd

Find all nodes in a binary tree that are at the distance K from the root

Input: A binary tree, number K
Output: A list of all nodes that are at the distance K from the root
Problem: Find all nodes in a binary tree that are at the distance K from the root
Complexity: Time O(n), Memory 0(n)
Example:

     
L:0       A
...    /     \
L:1   B       C
...   \     / \
L:2    D   E   F
...   / \
L:3  G   H


For K = 2, we got [D, E, F], L:N stands for the level N.

Firstly, let’s assume our tree node is defined in the following way:

struct Node
{
    struct Node * left;
    struct Node * right;
    int value;
};

Basically what we are looking for is the level K, as all nodes on this level will be at the distance of K from the root. In case of this problem there is really not much to think. The only choice we need to make is if go for breadth-first-search, or depth-first-search algorithm. Then we go to the distance K from our root node and we add the nodes at this level to the result. Below you can find both of the algorithms implemented.

DFS approach:

void findKFarFromRoot(Node * node, int K, vector result)
{
    if(!node)return;
    if(K == 0)
       result.append(node);
    else {
       findKFarFromRoot(node->left, K - 1, result);
       findKFarFromRoot(node->right, K - 1, result);
    }
}

BFS approach:

vector findKFarFromRoot(Node * root, int K)
{
   vector<pair<Node *, int> > result;
   if(!root)
     return result;

   queue q;
   q.push(make_pair(root, K));

   while(!q.empty())
   {
      pair currentNode = q.front(); q.pop();
      if(currentNode.second == 0)
        result.push_back(currentNode);
      else
      {
        // add left and right if they exist, and decrement the
        // distance
        Node * cur = currentNode.first;
        if(cur->left)
           q.push(make_pair(cur->left, currentNode.second - 1));
        if(cur->right)
           q.push(make_pair(cur->right, currentNode.second - 1));
      }
   }
   return result
}

Obviously, both algorithms has complexity O(n) as they visit every node once. Please free to comment if you see mistakes, or something is not clear. But keep in mind it’s a rather c++ based pseudo code that real solution (althought should be correct). Both solutions works for a more generic case: they return all the nodes at the distance K from a node given as a parameter.

Best Regards

Joe Kidd

Find the only one non-repeating value in a sorted array

There is a sorted array, where every element appears in the array twice, except one, that appears once. The task is to find this element. As an example we may consider [1, 1, 2, 3, 3, 5, 5] and the element that we look for is 2. Can we achieve it in O(log n) and in place?

Let’s consider what we know already about the problem.
(1) The information, that the array is sorted, is pretty useful, as it means all the repeated elements are consecutive (i.e 1,1 or 3,3). Moreover we know that the element, that we are looking for, has to be located somewhere between those continuous 2-elements sequences (i.e 1,1, 2, 3, 3 is correct, but 1, 2, 1, 3, 3 is not).

(2) All the elements appears twice, except one: it means that the size of the array is an odd number.

(3) If there is no extra element, it should contain only elements repeated twice and it should contain each of them. I mean in this case we would have [1, 1, 3, 3], [4, 4, 5, 5] so every occurrence of 1 and 3 appears in the left part. If an extra element is introduced to this part, we would have an inbalance of the amount of occurrences of the last element (it would overlow to the right part). It would be something like [1, 1, 2, 3], [3, 4, 4, 5, 5].

As we have just identified the properties, we can see that the non repeating number is located in the odd-size part of the array. The binary search always divides the search space into two pieces. Dividing an odd size space, gives two subspaces – one of the even size, and the second one of the odd size.

Unfortunately, dividing the array into two subarrays doesn’t give as any information which half is the odd size, and which is the even size. But we can divide the array arbitrary, so that the first half is always even size. Then comparing the last element L of the left subarray to the first element R of the right subarray, is sufficient to establish, in which half the extra element is located. If L != R then we need to check the second half of the array, otherwise the left one.

On the base of the previous paragraph we can develop an algorithm described in the pseudocode below.

int findOneElement(int array[], int size)
{
 if(size == 1)
     return array[0];

 int medium = size/2;
 // make the size even number
 medium = medium % 2 == 0 ? medium : medium + 1;

 if(array == array)
 {
     // look in the left subarray
     return findOneElement(array, medium - 1);
 }
 else
 {
    // look in the right subarray
    return findOneElement(array + medium + 1, size - (medium + 1));
 }

}

The complexity is obviously O(log n) as we are using binary search. Moreover we don’t use any extra memory, so we get O(1). All the requirements are met.

Best Regards

Joe Kidd