Jonathan Boccara's blog

The importance of knowing STL <algorithm>s

Published January 5, 2017 - 10 Comments

STL algorithms are a fantastic set of tool to improve expressiveness and correctness of your code. As outlined in Sean Parent’s famous talk C++ Seasoning the reality is quite straightforward: one needs to know his algorithms.

This post explains you how STL algorithms are to be used, and what they can bring to you.

Algorithms versus for loops

Let’s start with an example of code that could be found in production code. Could you tell what this code does ?

for (std::vector<company::salesForce::Employee>::const_iterator it = employees.begin(); it != employees.end(); ++it)
{
    employeeRegister.push_back(*it);
}

 

If you’re like most developers I know, you will scan this code and work out in 10 to 15 seconds that this code makes a copy of the elements from the collection of employees over to some register.

Now can you tell what this second piece of code does ?

std::copy(employees.begin(), employees.end(), std::back_inserter(employeeRegister));

 

Even if you don’t know what std::back_inserter means (which you will anyway, if you read on to the next section), you can instantly know that employees are copied to a register, because it is written in the code: copy. In this individual two-lines example the difference of time is not that big – it is only 10 to 15 seconds. But when you multiply this by the number of lines in your codebase, and when you consider more complex use cases, this really adds up to impair the reading of code.

std::copy is an algorithm of the STL, and can be found by #includeing the header <algorithm>. I realize that some things in this code are more noise than information, like .begin() and .end() for example, but this will be refined with ranges, which we explore in a dedicated post. Anyway this STL usage sets the basis for stating explicitly what action is performed.

Basically, STL algorithms say what they do – not how they do it. This really ties up with Respecting levels of abstractions, as explained in the dedicated post on this central principle.

std::copy and std::back_inserter

If you get that the above code does a copy but you don’t know yet the details of std::copy and std::back_inserter, let’s dive into it right now. This is an important example to understand because it is fairly common. Otherwise, you can just skip to the next section.

std::copy takes three iterators in input:

  • The begin and end of the input range, containing the elements to be copied from
  • The begin of the output range, where the copies should be put

Here is its prototype:

template <typename InputIterator, typename OutputIterator>
OutputIterator copy(InputIterator first, InputIterator last, OutputIterator out);

In the STL, the begin of a range is an iterator that points to its first element, and by convention the end of a range is an iterator that points to one after its last element:

The output iterator of std::copy is the beginning of the range the elements will be copied to.

std::copy iterates over the input ranges and successively copies all element over to the range starting with the out iterator:

As seen in the above figure, std::copy needs some space in the output collection to put all the elements it copies from the input. Most of the time though, working out in advance how much space should be made in the output collection and resizing it is impractical.

This is where std::back_inserter comes into play. std::back_inserter creates an iterator that is connected to the container it is passed. And when you write through this iterator, it will in fact call the push_back method of this container with the value you are trying to write. This effectively relieves the programmer – you – from resizing the output collection if it is a vector (like it is in most of the cases), because space is made by the output iterator directly each time std::copy writes through it.

As a result, the code using std::copy can be written this way:

std::copy(employees.begin(), employees.end(), std::back_inserter(employeeRegister));

This is regular C++. This is what the language natively offers as of this writing (<= C++17), although the topic of ranges allows to go much further. You should be able to read such code, and not be afraid to write it.

The advantages of using algorithms

As explained above, one of the main advantage algorithms bring is expressiveness, by raising the level of abstraction of code. That is to say they show what they do, rather than how they are implemented.

However they also bring along several other advantages:

  • They avoid some common mistakes, like off-by-one errors or dealing with empty collections. When you write a for loop, you always need to make sure that it stops at the right step, and it behaves correctly when there is no element to iterate over. All algorithms deal with these for you.
  • When using an STL algorithms, you get an implementation of a certain level of quality. These algorithms have been implemented by people who knew what they were doing, and have been extensively tested. By using them you benefit from this level of quality.
  • STL algorithms bring you the best algorithmic complexity you can get.  std::copy is quite straightforward to get right, but there are other more complex algorithms that could be naively implemented in O(n²) but that could be optimized to O(n) for example, like algorithms on sets. The STL offers the best implementation in this regard.
  • The design of the STL decouples algorithms from the data they operate on, so that data and operations can evolve independently, at least to a certain extent.

 

Two pitfalls to be aware of when adopting algorithms

Hopefully by now you have decided to use STL algorithms to improve your code. But before you start, there are two classical pitfalls that you need to know.

Don’t use  for_each for each problem

If you come from the habit of writing for loops, you could be attracted by std::for_each, because this algorithm somewhat looks like a for loop. Indeed for_each successively applies a function (or functor or lambda) to all the elements of a collection:

template <typename InputIterator, typename Function>
Function for_each(InputIterator first, InputIterator last, Function f);

std::for_each is indeed an STL algorithm and for this reason it is a good thing to have it in your toolbox. But there is mainly one specific case where for_each is effectively adapted: when performing side effects. Indeed for_each should be used to modify the elements of the collection it is applied on, or to perform side effects on a more general sense, like sending information to a logger or to an external service.

If, for instance, you rather need to count the number of times a value is present is a collection, don’t use for_each. Use std::count.
If you need to know if there is at least one element satisfying a predicate in your collection don’t use for_each. Use std::any_of.
If you need to know whether all the elements of a collection satisfy a given predicate, use std::all_of.
If you need to know whether a collection is a permutation of another one, in the most efficient way possible, use std::is_permutation.

And so on.

The STL offers an vast variety of ways to express your intent to make your code as expressive as possible. You can benefit from this by choosing the algorithm that fits best in each given situation (or write your own, as we will cover in a future post).

So many algorithms

The variety of algorithms there are can be somewhat overwhelming. The second pitfall when moving to algorithms is that when you look them up on a reference like this one, you will recognize a couple of them, like copy, count or find, and easily see how these can be useful to you.

But alongside in the list are algorithms whose names may sound mysterious to you, like std::lexicographical_compare, std::set_symmetric_difference or std::is_heap_until.

A natural reaction would be to ignore these strange looking algorithms, because you may think they are very complicated or designed for specific situations you will never encounter. I for sure had this reaction when I first started with STL algorithms.

But this is wrong. Nearly all algorithms are useful in day-to-day code.

Let’s take the example of std::set_difference. Do you know this algorithm ? It does a difference of sets (a set in a sense of a sorted collection, not only std::set). That is to say with a sorted collection A and a sorted collection B, set_difference outputs the elements in A that are not present in B:

How can this be useful ?

Let’s take an example of a calculation model that does caching. Each time this model is computed, it produces several results that can be added to the cache. We represent the cache as an associative container with keys and values where several identical keys are allowed, which is what std::multimap is made for.

So the model produces results this way:

std::multimap<Key, Value> computeModel();

And the caching can accept new data this way:

void addToCache(std::multimap<Key, Value> const& results);

In the implementation of the addToCache function, we need to be careful not to add results that already exist in the cache, to avoid duplicates adding up.

Here is how this could be implemented without using algorithms:

for (std::multimap<Key, Value>::const_iterator it = newResults.begin(); it != newResults.end(); ++it)
{
    std::pair<std::multimap<Key, Value>::const_iterator, std::multimap<Key, Value>::const_iterator> range = cachedResults.equal_range(it->first);
    if (range.first == range.second)
    {
        std::multimap<Key, Value>::const_iterator it2 = it;
        while (!(it2->first < it->first) && !(it->first < it2->first))
        {
            ++it2;
        }
        cachedResults.insert(it, it2);
    }
}

I don’t suggest you try to understand the above code line by line. Rather, we can reformulate the problem differently: we need to add to the cache the elements that are in the results, but that are not in the cache. This is what std::set_difference is made for:

std::multimap<Key, Value> resultsToAdd;

std::set_difference(newResults.begin(),
                    newResults.end(),
                    cachedResults.begin(),
                    cachedResults.end(),
                    std::inserter(resultsToAdd, resultsToAdd.end()),
                    compareFirst);

std::copy(resultsToAdd.begin(), resultsToAdd.end(), std::inserter(cachedResults, cachedResults.end()));

std::inserter is similar to std::back_inserter excepts it call the insert method of the container it is associated with instead of push_back, and compareFirst is a function we define to tell std::set_difference to compare elements on their keys rather that on the pair key-value.

Compare the two pieces of code. The second one tells what is does (a set difference), while the first one only invites you to decipher it. In this particular example, there remains a bit too many arguments that are passed to set_difference though, which may make it somewhat difficult to understand when you are not used to it. This issue is mostly solved with the concept of ranges, presented in this post.

Just like you understand language constructs such as  if and for, you need to understand the components of the STL to be able to understand what the code is trying to tell you. Plainly said,  you need to know your algorithms.

Learning all of them takes time, but it is a useful investment. I will present them along various posts grouped by themes (the first one being scheduled for Jan 17) so that you can see the logic between them. Hopefully this should make it easier for you to remember as many of them as possible, as effortlessly as possible.

Related articles:

Don't want to miss out ? Follow:   twitterlinkedinrss
Share this post!Facebooktwitterlinkedin

Comments are closed