*There is a huge set S of numbers of size N that can’t fit in the memory. Choose K of elements from the stream randomly, with equal probability of being selected.*

Without to much introduction, let’s just say that the answer to the problem is Reservoir Sampling. An example code could look like:

std::vector<int> reservoir_sampling(Stream S; unsigned long long N, unsigned int K){
unsigned long long i = 0;
std::vector<int> reservoir;
// fill the reservoir
for(; i < K; ++i)
reservoir.push_back(S.next());
// replace if needed
for(; i < N; ++i){
int next_token = S.next();
unsigned int r = rand() % i;
if(r < K)reservoir[i] = next_token;
}
return reservoir;
}

### Like this:

Like Loading...