K-way merge

There is a certain number if sorted arrays and each of them contains numbers. We want to take an element from every array in such a way, that the difference between the elements is the lowest.

For simplicity let’s say N=3 and the arrays are like:

A = {4, 10, 15, 20}
B = {1, 13, 29}
C = {5, 14, 28}

Now we need to choose one element from each of array, let’s call them a, b, c, such that |a – b| + |b – c| + |c – a| is minimal. For this case the solution is a = 15, b = 13, c = 14. But how to find it?

For solving the problem we are going to use minimal heap data structure with constant size. In our case the size of the heap is equal to number of arrays we process, so the size = 3. In the first step we take the minimal element from the array, and we store the difference of its sum. Now we are going to remove the element from the top of the heap and replace it with the next element from the array the removed element comes from. Inserting the a new element into the heap will rearrange the elements so that the minimal one will be always on the top. We store again the new difference if it’s smaller than the previous one and we repeat until one of the arrays becomes empty. Let’s take a look at an example:

H = {1, 4, 5}, diff = 8
we remove 1 and put 13 as 1 belonged to the same array
H = {4, 5, 13}, diff = 18
we remove 4 and put 10 as 4 belonged to the same array
H = {5, 10, 13}, diff = 16
we remove 5 and put 14 as 5 belonged to the same array
H = {10, 13, 14}, diff = 8
we remove 10 and we put 15 as 10 belonged to the same array
H = {13, 14, 15}, diff = 4
...

I was inspired to describe this solution, that is called k-way merged, after meeting several questions related to it on careercup – but without a very easy explanation. I hope it helps a bit. Actually it helps at least me 🙂

Best Regards

Joe Kidd

Advertisements

Find all nodes in a binary tree that are at the distance K from the root

Input: A binary tree, number K
Output: A list of all nodes that are at the distance K from the root
Problem: Find all nodes in a binary tree that are at the distance K from the root
Complexity: Time O(n), Memory 0(n)
Example:

     
L:0       A
...    /     \
L:1   B       C
...   \     / \
L:2    D   E   F
...   / \
L:3  G   H


For K = 2, we got [D, E, F], L:N stands for the level N.

Firstly, let’s assume our tree node is defined in the following way:

struct Node
{
    struct Node * left;
    struct Node * right;
    int value;
};

Basically what we are looking for is the level K, as all nodes on this level will be at the distance of K from the root. In case of this problem there is really not much to think. The only choice we need to make is if go for breadth-first-search, or depth-first-search algorithm. Then we go to the distance K from our root node and we add the nodes at this level to the result. Below you can find both of the algorithms implemented.

DFS approach:

void findKFarFromRoot(Node * node, int K, vector result)
{
    if(!node)return;
    if(K == 0)
       result.append(node);
    else {
       findKFarFromRoot(node->left, K - 1, result);
       findKFarFromRoot(node->right, K - 1, result);
    }
}

BFS approach:

vector findKFarFromRoot(Node * root, int K)
{
   vector<pair<Node *, int> > result;
   if(!root)
     return result;

   queue q;
   q.push(make_pair(root, K));

   while(!q.empty())
   {
      pair currentNode = q.front(); q.pop();
      if(currentNode.second == 0)
        result.push_back(currentNode);
      else
      {
        // add left and right if they exist, and decrement the
        // distance
        Node * cur = currentNode.first;
        if(cur->left)
           q.push(make_pair(cur->left, currentNode.second - 1));
        if(cur->right)
           q.push(make_pair(cur->right, currentNode.second - 1));
      }
   }
   return result
}

Obviously, both algorithms has complexity O(n) as they visit every node once. Please free to comment if you see mistakes, or something is not clear. But keep in mind it’s a rather c++ based pseudo code that real solution (althought should be correct). Both solutions works for a more generic case: they return all the nodes at the distance K from a node given as a parameter.

Best Regards

Joe Kidd