Reservoir Sampling

There is a huge set S of numbers of size N that can’t fit in the memory. Choose K of elements from the stream randomly, with equal probability of being selected.

Without to much introduction, let’s just say that the answer to the problem is Reservoir Sampling. An example code could look like:

std::vector<int> reservoir_sampling(Stream S; unsigned long long N, unsigned int K){
 unsigned long long i = 0;
 std::vector<int> reservoir;
 
 // fill the reservoir
 for(; i < K; ++i)
  reservoir.push_back(S.next());

 // replace if needed
 for(; i < N; ++i){
  int next_token = S.next();
  unsigned int r = rand() % i;
  if(r < K)reservoir[i] = next_token;
 }
 return reservoir;
}
Advertisements

Check if for a given matrix size, you are able to visit every cell


This problem was described in TopCoder SRM 594, but it’s simplification is described above. To sum up, we are given a matrix of size N, M and we need to check if it’s feasible to visit every cell in the matrix, traveling from cell to cell according to a simple rule: from position [i, j] we may go only to [i+1, j+1]. This problem is REALLY cool and strongly encourage you to read the solution when you got here!

The simplest solution would be just to create such a matrix and then start visiting it, until we realize we can’t visit it completely, or we do it. What should be a starting position of such a traversal? As we need to visit all the cells, we may choose randomly any of them as a starting position. However the solution would work, it’s not satisfactory as the complexity is O(M*N), what sucks.

Instead of thinking about the algorithm it would be good to stop here for a while and wonder when the matrix might be fully visited.

Did you really think? Ok, so then surely you know the solution. Anyway I will provide it here for those who cheated. The answer is: when the GCD of M, N is 0. Why? Because in this case the Lowest Common Multiple will be equal to M*N. And add it this information to your heart. What does it mean for us? It means the first element that is going to be visited second time, will be visited after M*N moves. As the size of the matrix is M*N, it means that all the elements of the matrix will be visited before we will reach an element the second time. Otherwise, when we meet an element already visited, before M*N steps we are sure that skipped some cells and we will never reach them, as the traversing methods remains the same.

Best Regards

Joe Kidd

Greatest Common Divisior of a sequence of numbers

It’s well known how to calculate a greatest common divisor of two numbers. But what if we are considering a sequence of them and we are expected to return gcd?

This probably not necessary, but just to keep thing in order the greatest common divisor algorithm might be defined in a following way:

gcd(A, B) = gcd(B, A % B)
gcd(A, 0) = A

The solution of the problem lays in the property of greatest common divisor. The gcd is associative what might be expressed:

gcd([A,B,C,D, ..., Z]) = gcd(gcd(gcd(gcd(gcd(A,B), C) D),...), Z)

A, B, C, D… are some integers. Done 🙂

Best Regards

Joe Kidd

Convert a mathematical expression into RPN

This is a very common technique and actually it has been described in thousand of places. We are given a string representing a mathematical expression, and we are expected to convert it into Reverse Polish Notation what makes evaluation way easier. As an example let’s consider a following expression 2 + 5 * 4 that in a RPN has a form of 2 5 4 * +.

So we have an expression given in an infix notation and we need to convert it to RPN. Not to make the problem to difficult we assume there are no function operators, nor separators and neither unary minus. So basically we have only positive numbers, parenthesis and basic arithmetic operations like +, -, *, /. And what is the most important we assume the expression is correct, so no need to validate (even if it was pretty simple it’s not a part of the problem).

The conversion algorithm has been created by Dijkstra and is called Shunting-yard algorithm. The algorithm uses two data structures, a queue Q and a stack S:

while there are tokens to read
    t = the next token
    if isNumber(t) then Q.push(t)
    if isOperator(t) then
        while isOperator(S.top())
              and priority(S.top()) > priority(t) then
           Q.push(S.pop());
        S.push(t);
    if isLeftParenthesis(t) then
       S.push(t);
    if isRightParenthesis(t) then
       while ! isLeftParenthesis(S.top()) then
          Q.push(S.pop());
       Q.pop()
while S.empty() == false
    Q.push(S.pop());

An interesting part is in the line 6 of this code, where we are considering the order of operations. As we know the operations * and / have higher priority than + and – so they should be executed before. Then we are poping them from the stack and putting into the queue.

Best Regards

Joe Kidd

Find 3-elements long arithmetic progression in a sorted array

So we are given an array that is sorted in ascending order, and we are expected to find 3-elements that are arithmetic progression. As the O(n^3) solution is trivial we are expected to find a better one. Obviously the elements are spread in the array, but the order is maintained.

As an example we may consider the array [1, 9, 10, 11, 16, 21, 30] that has the arithmetic progression [1,11,13]. The trick that is going to lead to the O(n^2) solution bases on the property of an arithmetic progression, that says: 3 elements L, M, R are arithmetic progression when L < M < R. and 2 * M = L + R.

Cool! As the array is sorted L lays on the left hand side of M, and R on the right hand side. So for a fixed M, we need to find L and R meeting the condition above. We may firstly fix L = M – 1, and R = M + 1 and the iterate updating the values of L and R according to the following rule:

if array[L] + array[P] > 2 * array[M] then L--
if array[L] + array[P] < 2 * array[M] then P++
if array[L] + array[P] == 2 * array[M] then we have the answer

What leads us to a implementation of O(n^2).

bool hasArithmeticProg(vector<int> array)
{
    if(array.size() < 3)
        return false;
    
    for(int M = 1; M < array.size() - 1; M++)
    {
        int L = M - 1;
        int R = M + 1;
        while(L >= 0 && R < array.size())
        {
            if(array[L] + array[R] == 2 * array[M])
                return true;
            else if(array[L] + array[R] > 2 * array[M])
                L--;
            else R++;
        }
    }
    return false;
}

The problem is not really about finding the arithmetic progression, is rather about efficient way of finding if there are two elements in a sorted array that sum up to a given number. The arithmetic progression is just an extra thing to make it more funny.

Best Regards

Joe Kidd

Bit tricks and tricky questions

Those problem pretty often appears in interviews. I had them spread in lot’s of places, so just to order things a little bit I gathered them in one place. On the other hand they are small enough they don’t deserve for a seperate post.

1. Add two numbers without using arithmetic operations.
2. The lowest of three numbers without using comparison
3. Multiply by seven without using multiplication operator.
4. Swap of the numbers without using an extra variable
5. Check if a number is a power of 2

Problem: Add two numbers without using arithmetic operations.

We need to note that XOR of two numbers contains it sum if there is no carry anywhere. I.e

3 = 101
2 = 010
————— XOR
5 = 111

So we may use this property, remembering to consider the carry at every step.

int carry = 1;
// while we don't have carry anymore
while(carry)
{
// so we get all the positions where carry may appears
carry = x & y;
// we got the sum without a carry
x = x ^y;
// we move the carry, one position on the left
// and we try to add again
y = carry << 1;
}

Problem: The lowest of three numbers without using comparison

What about this solution?

while(x && y && z)
{
x--; y--; z--;
c++;
}
return c;

The other solution uses only bit operation what makes it even more interesting. Obviously if we know a way to choose the smallest of two numbers, we can extend it to three of them. For two numbers the solution could be:

// mask will contain only 0s, or only 1s, depending on
// if the difference between x and y is positive
// or negative (check U2 system).
int mask = ((x - y) >> (sizeof(int) * CHARBIT - 1));
// if the mask contained zeros it means that x >= y,
// so (x - y) dissapears.
// if the mask contains only 1s, then the difference
// remains and we get y + x - y = x.
int min = y + (x - y) & mask;</pre>

Problem: Multiply by seven without using multiplication operator.

// multiply by 8
int x8 = x << 3;
int x7 = x8 - x;


Problem
: Swap of the numbers without using an extra variable


a = b - a;
b = b - a;
a = a + b; // because (b - a) + (b - b + a) = b


Problem
: Check if a number is a power of 2


return (N & (N - 1)) == 0;

Find a permutation knowing its rank

Input: A number M, describing a rank of permutation of N elements
Output: Permutation with a given rank
Problem: We are given a rank and we need to find the proper permutation
Example: rank=1 p=123, rank = 2 p=132, rank = 3, p=213, etc.

In this article we have discussed how to find a rank of a permutation. Now let’s try to do the opposite thing: knowing the rank, find the proper permutation.

We will use the idea described in Skiena’s book, “The Algorithm Design Manual” (if you don’t have it yet, go and buy, as it’s a nice source of algorithmic problems and mr. Skiena seems to be a clever guy).

So we have a function Unrank(M, N), that according to the definition of the problem, tries to find what is the permutation at M position, in a permutation of N elements, having some order of generation.

Let me remind the Rank function from the article, that is given in the following way:

Rank(p) = (p_1 - 1) * (|p| - 1)! + Rank(p_2,...,p_{|p|})

How can we revert the permutation from the given rank number? The part (p_1 - 1) * (|p| - 1)! gives us an information how many permutations of size |p| appears, before we got  the value at p_1 at the first position. Then we proceed for the next number by a recursive call. In this case we have to find such a maximum number X for p_1, that there will be at max M permutations in front of him. So the following condition has to to be met:

(X - 1)*(N - 1) !<= M

Getting this number, we are updating ordering, as it was previously described in the rank algorithm and then we proceed for a problem with smaller size.

Let’s take an example to see if we correctly understand the problem. Let’s say we are looking for M = 3, N = 3.

Step 1.


Unrank(3, 3):
Ordering [1,2,3] => [1,2,3]
X = 3, then (3 - 1) * (3 - 1)!  = 4 > 3, condition not met
X = 2, then (2 - 1) * (3 - 1)! = 2 <= 3, condition met, so p1 = 2
Update the ordering removing 2, we [1,2,3] => [1,x, 3 - 1] = [1,2]

Step 2.

Unrank(1, 2):
Ordering [1,3] => [1,2]
X = 2, then (2 - 1) * (2 - 1)!  = 1 <= 2, condition ment, so p2 = 3
Update the ordering removing 2, we [1,2] => [1,x] = [1]

Step 3.

Unrank(0, 1):
Ordering [1] = [1]
As we are looking for 0 element for 1 element ordering it will be
exactly this element, so p3 = 1

So the given permutation is p = [2, 3, 1] and using rank algorithm we can really check that this answer is correct. The M parameter also changes with the recursion. If we figured out that the element with rank Y should be used, then in the next step M = max(M – Y, 0).

Best Regards

Joe Kidd

One-sided (meta) binary search

The concept described in this article changes a little bit the way we look at the binary search. I have met also a different name of this approach on topcoder pages, where it’s called meta-binary search. The second time I met this algorithm in the Skiena’s book “Algorithms Manual Design”, where one-sided binary search is used to describe it.

Honestly, I prefer the Skiena’s name, as it seems to be more accurate and describe what really happens. The classical binary search algorithm operates on a sub-range of values, like implemented on the related wiki page. As you can see, there is an upper and lower boundary provided, defining the sub-range of the array to be searched. Instead of considering the boundaries and finding the mid point, we can actually consider only the mid point – and this is what the meta binary search does. As an example is better than precept, let’s take a look at some.

Let’s say we have a sorted array and we need to find a given value. Just not to create messy examples, let’s consider the simplest one we can imagine: [1,2,3,4,5,6,7]. If we look for a number six, it will be find at the position 5 (0-indexed array).

In the first step we need to find a value of log N, that will be useful for comparing elements and choosing the proper part of the array to analyse. Of course log is 2-based. We can do it using simple iteration, like this:

int getLogOf(int N)
{
    int log2 = 0
    while((1 << log2) <= N)log2++;
    return --log2;
}

What this value really does, it is explained in the listening below, but in the “human being” words, we may define it as a distance to the next mid element.

Then we can run the search itself, using the following piece of code:

int find(int array[], int N, int K)
{
    int log2 = getLogOf(N);
    int currentPosition = 0;
    while(log2 >= 0)
    {
       // check if we got a solution
       // if yes, the return it
       if(array[curPosition] == K)
           return curPosition;
       // check the next postion
       int newPosition = curPosition + (1 << log2);
       // if it's smaller or equal then we have to
       // update the current position
       if(newPosition < N && array[newPosition] <= K)
           curPosition = newPosition;
       // decrement the skip size
       log2--;
    }
return -1;
}

So we have an array [1,2,3,4,5,6,7], and we look for 6. Because the size of the array is 7, we know that the log base 2 of 7 is 2 (2^2 = 4, but 2^3 = 8, so we take 2 as the less evil). Let’s see the iterations, the bold element indicates currently considered point:

Iter1: [1,2,3,4,5,6,7]
  log=2, currentPosition = 0, newPosition = 0 + 2 ^ 2 = 4, array[newPosition] = 5 < 6
  currentPosition = newPosition
  log--
Iter2: [1,2,3,4,5,6,7]
  log=1, currentPosition = 4, newPosition = 4 + 2 ^ 1 = 6, array[newPosition] = 7 > 6
  log--
Iter2: [1,2,3,4,5,6,7]
  log=1, currentPosition = 4, newPosition= 4 + 2 ^ 0 = 5, array[newPosition] = 6 == 6
  return 5

I hope this article makes you a little bit more familiar with the concept of one sided binary search, that is a very handy tool – check the top coder lowest common ancestor article to know why!

Best Regards

Joe Kidd