K-way merge

There is a certain number if sorted arrays and each of them contains numbers. We want to take an element from every array in such a way, that the difference between the elements is the lowest.

For simplicity let’s say N=3 and the arrays are like:

A = {4, 10, 15, 20}
B = {1, 13, 29}
C = {5, 14, 28}

Now we need to choose one element from each of array, let’s call them a, b, c, such that |a – b| + |b – c| + |c – a| is minimal. For this case the solution is a = 15, b = 13, c = 14. But how to find it?

For solving the problem we are going to use minimal heap data structure with constant size. In our case the size of the heap is equal to number of arrays we process, so the size = 3. In the first step we take the minimal element from the array, and we store the difference of its sum. Now we are going to remove the element from the top of the heap and replace it with the next element from the array the removed element comes from. Inserting the a new element into the heap will rearrange the elements so that the minimal one will be always on the top. We store again the new difference if it’s smaller than the previous one and we repeat until one of the arrays becomes empty. Let’s take a look at an example:

H = {1, 4, 5}, diff = 8
we remove 1 and put 13 as 1 belonged to the same array
H = {4, 5, 13}, diff = 18
we remove 4 and put 10 as 4 belonged to the same array
H = {5, 10, 13}, diff = 16
we remove 5 and put 14 as 5 belonged to the same array
H = {10, 13, 14}, diff = 8
we remove 10 and we put 15 as 10 belonged to the same array
H = {13, 14, 15}, diff = 4

I was inspired to describe this solution, that is called k-way merged, after meeting several questions related to it on careercup – but without a very easy explanation. I hope it helps a bit. Actually it helps at least me 🙂

Best Regards

Joe Kidd


Dutch flag problem – variation

We are given an array and we need to sort it in the way, so that all negative numbers go to the beginning, all positive numbers to the end and the relative position of numbers is not changed. In example: [-1, 4, 0, -2, 2] => [-1, -2, 0, 4, 2].

Basically I have promised to myself to stop updating this blog for a while, as I am kinda busy recently, but this question surprised me and couldn’t imagine there is a such long discussion related to it.

The problem has been described at careercup.com, and I can’t believe no one has noticed such a simple solution. So basically we are given an array of numbers, where some of them are negative and some of them positive. We need to sort the array in the way, negative numbers go first, positive after, but the relative position is not changed. This means that if we have an array [-1,3,-3,2] the answer should be [-1,-3,3,2]. So what we need is a stable sort. However it would be still not enough. But what if we don’t perceive numbers as numerical values, but we assign them one of three types: positive, null, negative? Then we are done actually.

In the other words, let’s assign all the negative numbers the value -1 and all positive numbers value 1. Let 0 be 0 😉 So we may write a following implementation in c++:

bool cmp(int a, int b)
// consider negative numbers as the same class
if(a < 0 && b < 0) return false;
// consider positive numbers as the same class
else if(a > 0 && b > 0) return false;
// otherwise normal comparison will do. This includes
// also the case when a = 0 or b == 0 then we use the normal comparison
else return a < b

void special_sort(vector &amp; special_vector)

So what we did, we just solved this much discussed problem using nothing more than sorting and STL library to get the complexity of O(n logn) that matches the best solutions from careercup.com 🙂

Best Regards

Joe Kidd

Useful sums and formulas

In algorithmic challenges you may very often meet problems that are somehow related to mathematical series, distributions, or combinatorics. As I not always remeber all the formulas I have created a quick reference.

1. Arithmetic Progression and Series

a) Sum
S_n = \frac{n}{2}(a_1 + a_n)
b) N-th element of the progression
a_n = a_1 + (n-1)d
Where r stands for the difference

2. Geometric Progression and Series

a) Sum
S_n = a_1\frac{(1 - r^n)}{1-r}
b) N-th element of progression
a_n = a_1 * r ^ n
Where r stands for ratio.

3. Gauss sum
\sum\limits_{k=1}^n k = \frac{n(n + 1)}{2}

4. Gauss series
\sum\limits_{k=1}^{\infty} (k+1)z^k = \frac{1}{(1 - z)^2}

5. Sum of squares
\sum\limits_{k=1}^{n} k^2 = \frac{n(n+1)(2n+1)}{6}

6. Factorials sum
\sum\limits_{k=0}^{n} k(k+1) = \frac{n(n+1)(n+2)}{3}

7. Binomial theorem
\sum\limits_{k=0}^{n}  \binom {n} {k}x^ky^{n-k} = (x+y)^n

8. Harmonic function
ln( \frac{1}{1-z})= \sum\limits_{k=0}^{\infty}(z^k/k)

In the next article we will summarise probability.

Best Regards

Joe Kidd

Check if for a given matrix size, you are able to visit every cell

This problem was described in TopCoder SRM 594, but it’s simplification is described above. To sum up, we are given a matrix of size N, M and we need to check if it’s feasible to visit every cell in the matrix, traveling from cell to cell according to a simple rule: from position [i, j] we may go only to [i+1, j+1]. This problem is REALLY cool and strongly encourage you to read the solution when you got here!

The simplest solution would be just to create such a matrix and then start visiting it, until we realize we can’t visit it completely, or we do it. What should be a starting position of such a traversal? As we need to visit all the cells, we may choose randomly any of them as a starting position. However the solution would work, it’s not satisfactory as the complexity is O(M*N), what sucks.

Instead of thinking about the algorithm it would be good to stop here for a while and wonder when the matrix might be fully visited.

Did you really think? Ok, so then surely you know the solution. Anyway I will provide it here for those who cheated. The answer is: when the GCD of M, N is 0. Why? Because in this case the Lowest Common Multiple will be equal to M*N. And add it this information to your heart. What does it mean for us? It means the first element that is going to be visited second time, will be visited after M*N moves. As the size of the matrix is M*N, it means that all the elements of the matrix will be visited before we will reach an element the second time. Otherwise, when we meet an element already visited, before M*N steps we are sure that skipped some cells and we will never reach them, as the traversing methods remains the same.

Best Regards

Joe Kidd