Remove elements from std::vector

The title of the post would suggest an answer that is just obvious and discourage you from reading it. Unfortunately I meet very often production code, that removes elements from a vector according to a certain predicate, that is written in a horrible way.

Problem definition

The problem starts with a vector of certain objects, for simplicity let us assume we are considering integers. Then we are given a predicate saying that we would like to clean the vector from the values meeting certain condition. In our case we could remove all even numbers, or prime numbers, or numbers with the sum of digits equal 9, or whatever your imagination brings. In the real world the problem could be more sophisticated and we could remove all the connections that were disconnected and there is no reason to maintain them further.

How it usually looks like

What programmers usually do, they iterate over the collection and check every element if it meets a certain condition. If no, the element should be removed. This way of thinking is not bad, however when it comes to implementation, the things usually go wrong. Here is what most of the developers I know usually do:

for(auto i = v.begin(); i != v.end(): i++)
{
   if(predicate(*i))
   {
      i = v.erase(i);
   }
}

Now, you could ask what is wrong with this code. I would answer with another question: what is the complexity? At first glance it might look like it is O(n), but when you think what erase actually does, it appears that it is actually O(n^2). The erase operation is O(n) costly as it has to rearrange the vector after removing elements. We are even informed about this, when we take a more careful look at the standard:

Because vectors use an array as their underlying storage, erasing elements in positions other than the vector end causes the container to relocate all the elements after the segment erased to their new positions. This is generally an inefficient operation compared to the one performed for the same operation by other kinds of sequence containers (such as list or forward_list).

To give you some intuition about what happens, let us imagine the vector that looks like following

[ 5 | 4 | 7 | 8 | 1 | 3 | 6 ] 

We want to remove the element 8. When this happens, all the elements on the right hand from 8 has to be shifted left.

[ 5 | 4 | 7 | x | 1 | 3 | 6 ] 
[ 5 | 4 | 7 | 1 | x | 3 | 6 ] 
[ 5 | 4 | 7 | 1 | 3 | x | 6 ] 
[ 5 | 4 | 7 | 1 | 3 | 6] 

What is O(n) operation. Considering the outer loop we fall into O(n^2) complexity, that is not acceptable as long as we may get something better and in this case we can.

How it should look like

The STL library provides you a set of algorithms that make the operations on the containers easier. One of them is the remove_if operation, that accepts two iterators which indicate the range that should be considered, and the mentioned predicate. The remove_if approach is way more clever than the loop/erase one. L What the specification says about remove_if is:

The removal is done by replacing the elements for which pred returns true by the next element for which it does not, and signaling the new size of the shortened range by returning an iterator to the element that should be considered its new past-the-end element.

To clarify it even more, let us take a look at the example. Let us consider the vector from the previous example and see how remove_if would work when the predicate is “remove if the number is divisible by 4”.

[ 5 | 4 | 7 | 8 | 1 | 3 | 6 ] 
[ 5 | 7 | 1 | 3 | 6 | 4 | 8 ] 
-------------------- xxxxxxxx
       good            bad

As we can see the elements that meet the predicate have been moved to the end of the array. Secondly the remove_if returns the iterator indicating the position after “the last good element” (6 in this case), that we may use to remove the range of “bad elements”. Before we will do it, it is worth to mention that the order of “good elements” is maintained, but it is not necessary the case for “bad elements”.

Finally we could get a code that looks like follows:

v.erase(
   remove_if(v.begin(), v.end(), [](const int & in){return i % 4 == 0;}),
   v.end());

Yes, we still use the erase method, but this time we are removing from the end of the vector so there is no necessity of rearrangement the elements and the erase operation is O(1). In this case the complexity would be O(n), what is quite satisfactory in comparison to O(n^2).

Summary

I think that this article clearly shows how important it is, to understand well the mechanism behind Standard Template Library. It is a really powerful tool and knowing how to use it properly makes your life way easier and your programs way faster.

Advertisements