Jonathan Boccara's blog

STL Function objects: Stateless is Stressless

Published January 23, 2017 - 4 Comments

Daily C++

STL function objects: stateless is stressless

The need for function objects arises almost as soon as you start using the STL. This post shows how to design them so that they contribute in making your code using the STL more expressive and more robust.

 

Function objects

Here is a brief recap on function objects before getting to the meat. If you’re already familiar with them you can skip to the next section.

A function object is an object that can be used in a function call syntax:

myFunctionObject(x);

even though it is declared with a class (or a struct). This syntax is allowed by the declaration of an operator():

class MyFunctionObject
{
public:
    void operator()(int x)
    {
        ....
    }
}

The advantage of function objects over simple functions is that function objects can embark data:

class MyFunctionObject
{
public:
    explicit MyFunctionObject(Data data) : data_(data) {}
    void operator()(int x)
    {
        ....usage of data_....
    }
private:
    Data data_;
}

And at call site:

MyFunctionObject myFunctionObject(data);

myFunctionObject(42);

This way the function call will use both 42 and data to execute. This type of object is called a functor.

In C++11, lambdas fill the same need with a lighter syntax:

Data data;
auto myFunctionObject = [data](int x){....usage of data....};

myFunctionObject(42);

Since lambdas arrived in the language in C++11, functors are much less used, although there remains some cases where you need to use them as will be shown in a dedicated post (scheduled Feb 07th).

Functions, functors and lambdas can be used with the same function-call syntax. For this reason they are all callables.

Callables are used profusely with the STL because algorithms have generic behaviours that are customized by callables. Take the example of for_eachfor_each iterates over the elements of a collection and does something with each of them. This something is described by a callable. The following examples bump up every number of a collection by adding 2 to them, and show how to achieve this with a function, a functor and a lambda:

  • with a function the value 2 has to be hardcoded:
    void bump2(double& number)
    {
        number += 2;
    }
    
    std::vector<double> numbers = {1, 2, 3, 4, 5};
    
    std::for_each(numbers.begin(), numbers.end(), bump2);
  • with a functor, the bump value can be a passed as a parameter, which allows greater flexibility but with a heavier syntax:
    class Bump
    {
    public:
        explicit Bump(double bumpValue) : bumpValue_(bumpValue) {}
        void operator()(double& number) const
        {
            number += bumpValue_;
        }
    private:
        double bumpValue_;
    };
    
    std::vector<double> numbers = {1, 2, 3, 4, 5};
    
    std::for_each(numbers.begin(), numbers.end(), Bump(2));
    
  • and the lambda allows the same flexibility, but with a lighter syntax:
    std::vector<double> numbers = {1, 2, 3, 4, 5};
    
    double bumpValue = 2;
    std::for_each(numbers.begin(), numbers.end(),
                  [bumpValue](double& number){number += bumpValue;});
    

     

These examples show the syntax to manipulate function objects with the STL. Now here is the guideline to use them effectively: keep state away from them.

Avoid keeping a state in function objects

It may be tempting, especially when you start out using the STL, to use variables in the data embarked in your function objects. Like for storing current results updated during the traversal of the collection, or for storing sentinels for example.

Even though lambdas supersede functors in standard cases, many codebases are still catching up to C++11 (as exposed in this article) and don’t have lambdas available yet. Moreover as mentioned above, there remains cases that can only be solved by a functor. For these reasons I want to cover functors as well as lambdas in this post and in particular see how this guideline of avoiding state applies to both of them.

Functors

Let’s consider the following code that aims at counting the number of occurrences of the value 7 in the collection numbers.

class Count7
{
public:
    Count7() : counter_(0) {}
    void operator()(int number)
    {
        if (number == 7) ++counter_;
    }
    int getCounter() const {return counter_;}
private:
    int counter_;
};

At call site, this functor can be used this way:

std::vector<int> numbers = {1, 7, 4, 7, 7, 2, 3, 4};
    
int count = std::for_each(numbers.begin(), numbers.end(), Count7()).getCounter();

Here we instantiate a functor of type Count7 and pass it to for_each (the searched number could be parametrized in the functor to be able to write Count(7), but this is not the point here. Rather, I want to focus on the state maintained in the functor). for_each applies the passed functor to every element in the collection and then returns it. This way we get to call the getCounter() method on the unnamed functor returned by for_each.

The convoluted nature of this code hints that something is wrong in its design.

The problem here is that the functor has a state: its member counter_, and functors don’t play well with state. To illustrate this, you may have wondered: why use this relatively unknown feature of the return value of for_each ? Why not simply write the following code:

std::vector<int> numbers = {1, 7, 4, 7, 7, 2, 3, 4};
    
Count7 count7;
std::for_each(numbers.begin(), numbers.end(), count7);

int count = count7.getCounter();

This code creates a counting functor, passes it to for_each and retrieves the counter result. The problem with this code is that it simply doesn’t work. If you try to compile it you will see that the value in count is 0. Can you see why?

The reason is that, surprising at it sounds, count7 has never reached the inside of for_each. Indeed for_each takes its callable by value, so it is a copy of count7 that was used by for_each and that had its state modified.

This is the first reason why you should avoid states in functors: states get lost.

This is visible in the above example, but it goes further that this: for_each has the specificity of keeping the same instance of functor all along the traversal of the collection, but it is not the case of all algorithms. Other algorithms do not guarantee they will use the same instance of  callable along the traversal of the collection. Instances of callables may then be copied, assigned or destructed within the execution of an algorithm, making the maintaining of a state impossible. To find out exactly which algorithm provides the guarantee, you can look it up in the standard but some very common ones (like std::transform) do not.

Now there is another reason why you should avoid states within function objects: it makes code more complex. Most of the time there is a better, cleaner and more expressive way. This also applies to lambdas, so read on to find out what it is.

Lambdas

Let’s consider the following code using a lambda that aims at counting the number of occurrences of the number 7 in numbers:

std::vector<int> numbers = {1, 7, 4, 7, 7, 2, 3, 4};

int count = 0;
std::for_each(numbers.begin(), numbers.end(),
              [&count](int number){ if (number == 7) ++count;});
 
std::cout << count << std::endl;

This code calls a for_each to traverse the whole collection and increments the variable counter (passed by reference to the lambda) each time a 7 is encountered.

This code is not good because it is too complex for what it is trying to do. It shows the technical way of counting elements by exposing its state, whereas it should simply tell that it is counting 7s in the collection, and any implementation state should be abstracted away. This really ties up with the principle of Respecting levels of abstraction, which I deem to be the most important principle for programming.

What to do then?

Pick the right high-level construct(s)

There is one easy way to re-write the particular example above, that would be compatible with all versions of C++ for that matter. It consists of taking for_each out of the way and replacing it with count which is cut out for the job:

std::vector<int> numbers = {1, 7, 4, 7, 7, 2, 3, 4};

int count = std::count(numbers.begin(), numbers.end(), 7);

Of course this does not mean that you never need functors or lambdas – you do need them. But the message I’m trying to convey is that if you find yourself in need for a state in a functor or a lambda, then you should think twice about the higher-level construct you are using. There is probably one that better fits the problem you are trying to solve.

Let’s look at another classical example of state within a callable: sentinels.

A sentinel value is a variable that is used for the anticipated termination of an algorithm. For instance, goOn is the sentinel in the following code :

std::vector<int> numbers = {8, 4, 3, 2, 10, 4, 2, 7, 3};

bool goOn = true;
for (size_t n = 0; n < numbers.size() && goOn; ++n)
{
    if (numbers[n] < 10)
    {
        std::cout << numbers[n] << '\n';
    }
    else
    {
        goOn = false;
    }
}

The intention of this code is to print out numbers of the collection while they are smaller than 10, and stop if a 10 is encountered during the traversal.

When refactoring this code in order to benefit from the expressiveness of the STL, one may be tempted to keep the sentinel value as a state in a functor/lambda.

The functor could look like:

class PrintUntilTenOrMore
{
public:
    PrintUntilTenOrMore() : goOn_(true) {}

    void operator()(int number)
    {
        if (number < 10 && goOn_)
        {
            std::cout << number << '\n';
        }
        else
        {
            goOn_ = false;
        }
    }

private:
    bool goOn_;
};

And at call site:

std::vector<int> numbers = {8, 4, 3, 2, 10, 4, 2, 7, 3};
std::for_each(numbers.begin(), numbers.end(), PrintUntilTenOrMore());

The analogous code with a lambda would be:

std::vector<int> numbers = {8, 4, 3, 2, 10, 4, 2, 7, 3};

bool goOn = true;
std::for_each(numbers.begin(), numbers.end(), [&goOn](int number)
{
    if (number < 10 && goOn)
    {
        std::cout << number << '\n';
    }
    else
    {
        goOn = false;
    }
});

But these pieces of code have several issues:

  • the state goOn makes them complex: a reader needs time to mentally work out what is going on with it
  • the call site is contradictory: it says that it does something “for each” element, and it also says that it won’t go after ten.

There are several ways to fix this. One is to take the test out of the for_each by using a find_if:

auto first10 = std::find_if(numbers.begin(), numbers.end(), [](int number){return number >= 10;});
std::for_each(numbers.begin(), first10, [](int number){std::cout << number << std::endl;} );

No more sentinel, no more state.

This works well in this case, but what if we needed to filter based on the result of a transformation, like the application of a function f to a number? That is to say if the initial code was:

std::vector<int> numbers = {8, 4, 3, 2, 10, 4, 2, 7, 3};

bool goOn = true;
for (size_t n = 0; n < numbers.size() && goOn; ++n)
{
    int result = f(numbers[n]);
    if (result < 10)
    {
        std::cout << result << '\n';
    }
    else
    {
        goOn = false;
    }
}

Then you would want to use std::transform instead of std::for_each. But in this case the find_if would also need to call f on each element, which doesn’t make sense because you would apply f twice on each element, once in the find_if and once in the transform.

A solution here would be to use ranges. The code would then look like:

for_each(numbers | transform(f) | take_while(lessThan10),
         [](int number){std::cout << number << std::endl;});

Want to know more about ranges? Then head over to that post.

Related articles

Don't want to miss out ? Follow:   twitterlinkedinrss
Share this post!Facebooktwitterlinkedin

Comments are closed