Reservoir Sampling

There is a huge set S of numbers of size N that can’t fit in the memory. Choose K of elements from the stream randomly, with equal probability of being selected.

Without to much introduction, let’s just say that the answer to the problem is Reservoir Sampling. An example code could look like:

std::vector<int> reservoir_sampling(Stream S; unsigned long long N, unsigned int K){
 unsigned long long i = 0;
 std::vector<int> reservoir;
 
 // fill the reservoir
 for(; i < K; ++i)
  reservoir.push_back(S.next());

 // replace if needed
 for(; i < N; ++i){
  int next_token = S.next();
  unsigned int r = rand() % i;
  if(r < K)reservoir[i] = next_token;
 }
 return reservoir;
}
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s