Reservoir Sampling

There is a huge set S of numbers of size N that can’t fit in the memory. Choose K of elements from the stream randomly, with equal probability of being selected.

Without to much introduction, let’s just say that the answer to the problem is Reservoir Sampling. An example code could look like:

std::vector<int> reservoir_sampling(Stream S; unsigned long long N, unsigned int K){
 unsigned long long i = 0;
 std::vector<int> reservoir;
 
 // fill the reservoir
 for(; i < K; ++i)
  reservoir.push_back(S.next());

 // replace if needed
 for(; i < N; ++i){
  int next_token = S.next();
  unsigned int r = rand() % i;
  if(r < K)reservoir[i] = next_token;
 }
 return reservoir;
}
Advertisements

Counting factorials of relatively small numbers

We are given a big number, let’s say 345. It’s not big? It is, try to calculate the factorial. I will present a solution that might be required in the problem, however it’s mainly used for big number multiplication.

This solution is quite tricky but very easy to implement and explain. I think it’s worth being aware of it as it looks simple enough to be a problem for an interview. We may just keep the big number in an array and calculate the multiplication in the way mentioned below. So let’s say we have two numbers A = 345 B = 15, and A is kept in an array A = [5, 4, 3]. Now we multiply in the following way:

carry = 0;
Step 1:
array[0] = (array[0] * B + carry) % 10 =
 (5 * 15 + 0 ) % 10 = 75 % 10 = 5
carry = (array[0] * B + carry) / 10 = 75/10 = 7
Step 2:
array[1] = (array[1] * B + carry) % 10 = 
 (4* 15 + 7 ) % 10 = 67 % 10 = 7
carry = (array[1] * B + carry) / 10 = 67/10 = 6
Step 3:
array[2] = (array[2] * B + carry) % 10 = 
 (3* 15 + 6 ) % 10 = 51 % 10 = 1
carry = (array[2] * B + carry) / 10 = 51/10 = 5

Because carry is greater than zero we need to add it to the array and the result array is [5, 7, 1, 5], what gives a number 5175. Please check if the multiplication result is correct indeed.

Best Regards

Joe Kidd

Longest Arithmetic Progression

Input: A sorted array of numbers.
Output: The length of the longest arithmetic progression
Problem: We are given a sorted array and we need to find the length of the longest arithmetic progression..
Example: [1, 9, 10, 11, 16, 21, 28, 30, 31], answer=4.
Complexity: Time O(n^2), Memory 0(n^2)

Before reading this article, it’s strongly recommended to take a look at this one, as it describes the basic idea we gonna use in the solution.

The solution will base on a dynamic programming approach, as the problem has a property of optimal substructure – we can build arithmetic progression of length N if we already know one with lenght (N-1).

Let’s consider three indices L, M, R, L < M < R, and let’s use the property describing arithmetic sequence mentioned in the previous article. If we know three indices L,M,R that are in a arithmetic progression, and we know that M, R, Q indices are also in the aritmetic progression, then L,M,R,Q are aslo in arithmetic progression. We will use this property to create dynamic programming solution.

Another thing worth considering is that every pair of elements is a arithmetic progression already, what gives us initialization property of an array.

We will have A[L][M] array, where we will store the longest progression length, starting with L, M. We will process the progression from right to left, trying to find L and R for a fixed value of M. The code may look like this:

int longestProgression(vector<int> iProgression)
{
    int A[iProgression.size()][iProgression.size()];
    int maxLen = 2;
    // initialize last columnt values
    for(int i = 0; i < iProgression.size(); i++)
        for(int j = 0; j < iProgression.size(); j++)
            A[i][j] = 2;

    for(int M = iProgression.size() - 1; M > 0; M--)
    {
       int L = M - 1;
       int R = M + 1;
       while(L >= 0 && R < iProgression.size())
       {
           int LR = iProgression[L] + iProgression[R];
           int M2 = IProgression[M] * 2;
           if(LR > M2)
           {
               L--;
           }
           else if(LR < M2)
           {
               R++;
           }
           else
           {
               A[L][M] = A[M][R] + 1;
               maxLen = max(A[L][M], maxLen);
               L--; R++;
           }
       }
    }
    return maxLen;
}

The complexity of the solution is O(n^2) and the memory is O(n^2) too.

Find 3-elements long arithmetic progression in a sorted array

So we are given an array that is sorted in ascending order, and we are expected to find 3-elements that are arithmetic progression. As the O(n^3) solution is trivial we are expected to find a better one. Obviously the elements are spread in the array, but the order is maintained.

As an example we may consider the array [1, 9, 10, 11, 16, 21, 30] that has the arithmetic progression [1,11,13]. The trick that is going to lead to the O(n^2) solution bases on the property of an arithmetic progression, that says: 3 elements L, M, R are arithmetic progression when L < M < R. and 2 * M = L + R.

Cool! As the array is sorted L lays on the left hand side of M, and R on the right hand side. So for a fixed M, we need to find L and R meeting the condition above. We may firstly fix L = M – 1, and R = M + 1 and the iterate updating the values of L and R according to the following rule:

if array[L] + array[P] > 2 * array[M] then L--
if array[L] + array[P] < 2 * array[M] then P++
if array[L] + array[P] == 2 * array[M] then we have the answer

What leads us to a implementation of O(n^2).

bool hasArithmeticProg(vector<int> array)
{
    if(array.size() < 3)
        return false;
    
    for(int M = 1; M < array.size() - 1; M++)
    {
        int L = M - 1;
        int R = M + 1;
        while(L >= 0 && R < array.size())
        {
            if(array[L] + array[R] == 2 * array[M])
                return true;
            else if(array[L] + array[R] > 2 * array[M])
                L--;
            else R++;
        }
    }
    return false;
}

The problem is not really about finding the arithmetic progression, is rather about efficient way of finding if there are two elements in a sorted array that sum up to a given number. The arithmetic progression is just an extra thing to make it more funny.

Best Regards

Joe Kidd

One-sided (meta) binary search

The concept described in this article changes a little bit the way we look at the binary search. I have met also a different name of this approach on topcoder pages, where it’s called meta-binary search. The second time I met this algorithm in the Skiena’s book “Algorithms Manual Design”, where one-sided binary search is used to describe it.

Honestly, I prefer the Skiena’s name, as it seems to be more accurate and describe what really happens. The classical binary search algorithm operates on a sub-range of values, like implemented on the related wiki page. As you can see, there is an upper and lower boundary provided, defining the sub-range of the array to be searched. Instead of considering the boundaries and finding the mid point, we can actually consider only the mid point – and this is what the meta binary search does. As an example is better than precept, let’s take a look at some.

Let’s say we have a sorted array and we need to find a given value. Just not to create messy examples, let’s consider the simplest one we can imagine: [1,2,3,4,5,6,7]. If we look for a number six, it will be find at the position 5 (0-indexed array).

In the first step we need to find a value of log N, that will be useful for comparing elements and choosing the proper part of the array to analyse. Of course log is 2-based. We can do it using simple iteration, like this:

int getLogOf(int N)
{
    int log2 = 0
    while((1 << log2) <= N)log2++;
    return --log2;
}

What this value really does, it is explained in the listening below, but in the “human being” words, we may define it as a distance to the next mid element.

Then we can run the search itself, using the following piece of code:

int find(int array[], int N, int K)
{
    int log2 = getLogOf(N);
    int currentPosition = 0;
    while(log2 >= 0)
    {
       // check if we got a solution
       // if yes, the return it
       if(array[curPosition] == K)
           return curPosition;
       // check the next postion
       int newPosition = curPosition + (1 << log2);
       // if it's smaller or equal then we have to
       // update the current position
       if(newPosition < N && array[newPosition] <= K)
           curPosition = newPosition;
       // decrement the skip size
       log2--;
    }
return -1;
}

So we have an array [1,2,3,4,5,6,7], and we look for 6. Because the size of the array is 7, we know that the log base 2 of 7 is 2 (2^2 = 4, but 2^3 = 8, so we take 2 as the less evil). Let’s see the iterations, the bold element indicates currently considered point:

Iter1: [1,2,3,4,5,6,7]
  log=2, currentPosition = 0, newPosition = 0 + 2 ^ 2 = 4, array[newPosition] = 5 < 6
  currentPosition = newPosition
  log--
Iter2: [1,2,3,4,5,6,7]
  log=1, currentPosition = 4, newPosition = 4 + 2 ^ 1 = 6, array[newPosition] = 7 > 6
  log--
Iter2: [1,2,3,4,5,6,7]
  log=1, currentPosition = 4, newPosition= 4 + 2 ^ 0 = 5, array[newPosition] = 6 == 6
  return 5

I hope this article makes you a little bit more familiar with the concept of one sided binary search, that is a very handy tool – check the top coder lowest common ancestor article to know why!

Best Regards

Joe Kidd

Find the K-closest point to the origin

There is an unsorted array containing the coordinates of points in a 2D space and a positive number k. We are ask to find the the k-closest point to the origin, it is to the point with coordinates (0, 0). In example for an array [(5,5), (-4,3), (3,2), (1,0), (6, -5)], and k = 2, the answer would be (3, 2).

The first approach, we could use, is sorting. I mean firstly storing the distance to the origin in an array by extending the point structure, and then sorting it according to the newly added field. After that, we could find the answer in the k-th cell of the array. This solution would O(n + n log n) and unfortunately it’s not optimal. There is O(n) solution.

The problem is a special case of the quickselect algorithm, that uses the same partition procedure, as quicksort. But before we can apply the quickselect approach, we need to do some extra pre-processing related to distance calculation.

Let’s suppose the structure of our Point, except coordinates x, y, contains also an extra value called distance. So we get a code like this:

struct Point
{
  double x, y;
  double distance;
};

Now, in the first step of the algorithm, we need to calculate the distance from the origin to each point contained in the array.

void setDistanceFromOrigin(Point points[], int size)
{
    for(int i = 0; i < size; i++)
        points[i].distance = sqrt(points[i].x * points[i].x +
                    points[i].y * points[i].y);
}

Then we can apply the quickselect algorithm for finding the k-smallest element, using the calculated distance for comparison. So far it sounds like quicksort. But in this case we are not going to sort the whole array, but only the part, where the element of interests lays. And actually sorting is a little bit too strong word here. We will pivot the array around the mid point so that lower elements will be located on the left, and larger on the right hand side of the pivot. Then we will consider only the subarray where the k-th element is located. However the partition looks like in the quicksort algorithm:

int partition(vector<Point> & points, int l, int r, int m)
{  
   double pivot = points[m].distance;
   swap(points[m], points[r]); m = r; r--;

   while(l < r)
   {
      if(points[l].distance >= pivot){
         swap(points[l], points[r--]);
      }
      else l++;
   }
   swap(points[l], points[m]); m = l;
   return l;
}

The difference between quickselect and quicksort lays in the calling procedure, that in case of quicksort is called for both parts of the array, but in case of quickselect only for the part were the elements of interest is located:

Point & quickselect(vector<Point> points, int l, int r, int k)
{
    int m = (l + r) / 2;
    m = partition(points, medium);
    if(m == k)
         return points[m];
    if(pivot < K) 
        return findKClosestPoint(points, m + 1, r, k);
    else 
        return findKClosestPoint(points, l, m - 1, k);
}

This algorithm has O(n) complexity and doesn’t require extra space, so it’s quite efficient.

Best Regards

Joe Kidd

Find the only one non-repeating value in a sorted array

There is a sorted array, where every element appears in the array twice, except one, that appears once. The task is to find this element. As an example we may consider [1, 1, 2, 3, 3, 5, 5] and the element that we look for is 2. Can we achieve it in O(log n) and in place?

Let’s consider what we know already about the problem.
(1) The information, that the array is sorted, is pretty useful, as it means all the repeated elements are consecutive (i.e 1,1 or 3,3). Moreover we know that the element, that we are looking for, has to be located somewhere between those continuous 2-elements sequences (i.e 1,1, 2, 3, 3 is correct, but 1, 2, 1, 3, 3 is not).

(2) All the elements appears twice, except one: it means that the size of the array is an odd number.

(3) If there is no extra element, it should contain only elements repeated twice and it should contain each of them. I mean in this case we would have [1, 1, 3, 3], [4, 4, 5, 5] so every occurrence of 1 and 3 appears in the left part. If an extra element is introduced to this part, we would have an inbalance of the amount of occurrences of the last element (it would overlow to the right part). It would be something like [1, 1, 2, 3], [3, 4, 4, 5, 5].

As we have just identified the properties, we can see that the non repeating number is located in the odd-size part of the array. The binary search always divides the search space into two pieces. Dividing an odd size space, gives two subspaces – one of the even size, and the second one of the odd size.

Unfortunately, dividing the array into two subarrays doesn’t give as any information which half is the odd size, and which is the even size. But we can divide the array arbitrary, so that the first half is always even size. Then comparing the last element L of the left subarray to the first element R of the right subarray, is sufficient to establish, in which half the extra element is located. If L != R then we need to check the second half of the array, otherwise the left one.

On the base of the previous paragraph we can develop an algorithm described in the pseudocode below.

int findOneElement(int array[], int size)
{
 if(size == 1)
     return array[0];

 int medium = size/2;
 // make the size even number
 medium = medium % 2 == 0 ? medium : medium + 1;

 if(array == array)
 {
     // look in the left subarray
     return findOneElement(array, medium - 1);
 }
 else
 {
    // look in the right subarray
    return findOneElement(array + medium + 1, size - (medium + 1));
 }

}

The complexity is obviously O(log n) as we are using binary search. Moreover we don’t use any extra memory, so we get O(1). All the requirements are met.

Best Regards

Joe Kidd