The Terrible Problem Of Incrementing A Smart Iterator

Published February 12, 2019

The Terrible Problem Of Incrementing A Smart Iterator (or TPOIASI) is a difficulty that arises when implementing smart iterators.

But even if you don’t implement smart iterators, you may use them in a disguised form, now or in the future. And then, the TPOIASI might impact your code in a subtle manner.

Since the world is moving towards smart iterators – well, at least the C++ world – you should know what the TPOIASI is about, because it may try to bite you some day.

Smart iterators 101

To understand the TPOIASI, let’s start by its last two letters: the Smart Iterators. If you’re already familiar with smart iterators and range adaptors, you can skip to the next section.

Iterators

An iterator is a component linked to a range of objects (for example, to an STL container like std::vector), that has two missions:

giving access to the objects in the range, with operator*
moving along the range, with operator++, to access all the elements in the range successively.

Most of the STL iterators, such as those of std::vector or std::map, fulfil these two roles, that together allow to traverse a collection.

Smart iterators

This is not an official term, but a smart iterator is an iterator, so it also does those two jobs. But it does them in a special way.

One example of a smart iterator is the transform iterator, that does not just give access to an element of a range with its operator*. Instead, it gives the result of applying a function f to the element of the range.

Another example is the filter iterator. Its operator++ does not just moves to the adjacent element in the range. It moves to the next element in the range that satisfies a predicate p, (potentially moving past several elements of the range that wouldn’t satisfy p).

Another important aspect of smart iterators is that they can combine with other iterators. For example, a transform iterator can be plugged onto a vector iterator. In its operator*, the transform iterator calls the operator* of the vector iterator, and applies f on the value that the latter returns.

We could then have a filter iterator plugged onto a transform iterator, itself plugged onto a vector iterator. The result is an iterator that skips some of the results of applying f to the vector elements, if they don’t satisfy p. And smart iterators can combine into arbitrarily long chains.

Range adaptors

When the STL manipulates two iterators, like in its algorithms, it’s often to represent a range: one iterator represents the beginning of a range, and the other the end. Rather than having to manipulate those two iterators, it’s often more convenient to directly use a range instead.

A simple definition of a range is: something that provides a begin() and an end() iterator. In this definition, STL containers are ranges.

But the simplest implementation of a range is a structure that contains two iterators and offers a begin() and end() interface that return them.

Back to our smart iterators now. If we have two smart iterators, like two transform iterators, plugged onto the begin and end of the same vector, it can then define a smart range: a range that, when you iterate over it, gives you the results of applying f to each element of the vector.

Packaging this feature nicely into a component that will do the job of generating transform iterators for you, gets to something like this:

myVector | transform([](int n){ return n * 2; });

This a view over myVector, where you see all its values multiplied by 2. This is the sort of code you can write by using ranges libraries, such as range-v3. And ranges may well be the future of the STL.

And combined with filter iterators:

myVector | transform([](int n){ return n * 2; })
         | filter([](int n){ return n % 4; });

This is a view of the values of myVector multiplied by 2, that can be divided by 4.

Now that we have a better feel on what Smart Iterators are, let’s move on to the Terrible Problem Of Incrementing A Smart Iterator.

The TPOIASI

To illustrate the issue, let’s build a simple example using a range library. Here I’m using range-v3 which is available on Wandbox:

// Input vector
std::vector<int> numbers = {1, 2, 3, 4, 5};

// Output vector
std::vector<int> results;

//Apply transform and filter
ranges::push_back(results,
                  numbers | ranges::view::transform(times2)
                          | ranges::view::filter(isMultipleOf4));

// Display results
for (auto result : results)
{
    std::cout << result << ' ';
}

With times2 and isMultipleOf4 being:

int times2(int n)
{
   return n * 2;
}

bool isMultipleOf4(int n)
{
    return n % 4 == 0;
}

Here is what the code outputs:

4 8

Indeed, the numbers piped into transform give {2, 4, 6, 8, 10}, and the multiples of 4 here are 4 and 8, so it’s all fine.

Except there is a problem with this code, and a subtle one because it doesn’t show when you look at the code. Let’s trace the calls to the function in the transform adaptor:

int times2(int n)
{
   std::cout << "transform " << n << '\n';
   return n * 2;
}

Now here is what the code outputs:

transform 1
transform 2
transform 2
transform 3
transform 4
transform 4
transform 5
4 8

For some values the function is called several times!

This may not matter, like in our example with int. But if the function was doing a big computation, then we would notice a performance impact (it happened to me once). Or in the (questionable) case where the function has side effects, we would probably have wrong results.

Now why does the library call the function several times in the first place? To understand this, we need to think about how to implement a filter iterator.

The cause of the TPOISI

The cause of the TPOISI lies in its central I: the Increment operator, operator++, and more specifically the one of the filter iterator.

How would you implement the operator++ of the filter iterator? Imagine that your filter iterator is sitting somewhere in the collection, for instance in front of the first element that satisfies the predicate. In our example, that would be 2*2 = 4:

bug ranges C++

So let’s call operator++ on the filter iterator. The filter iterator calls operator++ on its underlying iterator (here, the transform iterator) and has to peek to the element to check where to stop:

bug range C++

But the filter iterator makes its check on the value returned by the transform iterator. And the transform iterator provides its value by applying its function. So here, we have a our function applied to 3 once and then applied to 4 once.

After calling operator++, the next step to traverse the collection is to get a value from filter iterator, by calling operator* This is what std::copy does, for example. And to provide a value, the filter iterator asks it to its underlying transform iterator, which then calls the function a second time on 4 to compute 4*2:

bug ranges C++

This is why the function times2 is called twice on 4.

How to work around the issue?

Let’s finish up with the first letters of the TPOIASI, the ones that makes it a Terrible Problem.

I call it that because it seems to me like a structural problem in the filter iterator, and filtering is a common need among the manipulations on ranges. Note that the transform iterator doesn’t suffer from the TPOIASI: with a transform on a transform, none of them gets called more than once.

So what’s special about the filter iterator? It is that it customizes the iteration on the underlying container, and has to peek to the underlying iterator to do that.

The issue can be reproduced in range-v3, I had also encountered it when trying to implement a filter iterator, and can’t see how to fix it. If you see how, please write a comment.

It’s not a showstopper for ranges, but it can be a real issue for some cases. In all cases, it’s good to be aware of it.

However, a couple of weeks ago, we have seen another approach to chain up operations on collections: smart output iterators, which are a sort of symmetry to the approach of ranges. Smart output iterator don’t suffer from the The Terrible Problem Of Incrementing A Smart Iterator, or at least not as much as ranges. Even though they have to make a little sacrifice.

How do smart output iterators offer resistance to the TPOIASI? What sacrifice will they have to make? This is what is coming up in the next post on Fluent C++.

About Jonathan Boccara