Reservoir Sampling

There is a huge set S of numbers of size N that can’t fit in the memory. Choose K of elements from the stream randomly, with equal probability of being selected.

Without to much introduction, let’s just say that the answer to the problem is Reservoir Sampling. An example code could look like:

std::vector<int> reservoir_sampling(Stream S; unsigned long long N, unsigned int K){
 unsigned long long i = 0;
 std::vector<int> reservoir;
 
 // fill the reservoir
 for(; i < K; ++i)
  reservoir.push_back(S.next());

 // replace if needed
 for(; i < N; ++i){
  int next_token = S.next();
  unsigned int r = rand() % i;
  if(r < K)reservoir[i] = next_token;
 }
 return reservoir;
}
Advertisements