Jonathan Boccara's blog

How Smart Output Iterators Avoid the TPOIASI

Published February 15, 2019 - 0 Comments

In the last post we saw the TPOIASI, or Terrible Problem Of Incrementing A Smart Iterator, that could incur a performance cost in code that uses range adaptors. Today, we’ll see how smart output iterators fare with the TPOIASI (spoiler: they have a way to avoid the problem).

Now if you’re wondering what smart iterators, smart output iterators or the Terrible Problem Of Incrementing Them is, here is a little refresher.

The TPOIASI

The TPOIASI occurs when an iterator that embeds logic in its operator++ (for instance, advancing to the next element that satisfies a predicate), is plugged onto another iterator, for example one that applies a function in its operator*.

In a range-style code, the situation looks like this:

// Input vector
std::vector<int> numbers = {1, 2, 3, 4, 5};

// Output vector
std::vector<int> results;

//Apply transform and filter
ranges::push_back(results,
                  numbers | ranges::view::transform(times2)
                          | ranges::view::filter(isMultipleOf4));

// Display results
for (auto result : results)
{
    std::cout << result << ' ';
}

with times2 and isMultipleOf4 being:

int times2(int n)
{
   std::cout << "transform " << n << '\n';
   return n * 2;
}

bool isMultipleOf4(int n)
{
    return n % 4 == 0;
}

(note the trace in times2).

The code outputs:

transform 1
transform 2
transform 2
transform 3
transform 4
transform 4
transform 5
4 8

For some elements, 2 and 4, the function is called more than once. This is a problem. And a terrible one because it is – in my opinion – structural to this range adaptor.

We had seen that the source of the problem is that the operator++ of filter that has to peek ahead to know where to stop, and then its operator* calls up the transform function again.

If you’d like to read more about the Terrible Problem Of Incrementing A Smart Iterator, you can have a look at its dedicated article.

Smart output iterators

Smart output iterators are a symmetrical approach to range adaptors, to manipulate collections in C++. This means that while range adaptors operate on input iterators and can funnel data into an STL algorithm, smart output iterators put some logic inside of the output iterators of an algorithm.

Take std::back_inserter for instance. It is an output iterator that embeds a push_back to a container. Smart output iterators generalise this idea by allowing output iterators to apply functions, filter on predicates, and a lot of other fancy treatments, to the data coming out of STL algorithms.

For instance, the equivalent code to the one above that used range adaptors would be, with smart output iterators:

// Input vector
std::vector<int> numbers = {1, 2, 3, 4, 5};

// Output vector
std::vector<int> results;

//Apply transform and filter
auto oIsMultiple4 = make_output_filter(isMultiple4);
auto oTimes2 = make_output_transformer(times2);

copy(numbers, oTimes2(oIsMultiple4(back_inserter(results))));

// Display results
for (auto result : results)
{
    std::cout << result << ' ';
}

Now do smart output iterators suffer from the TPOIASI? Do they call the function in transform multiple times?

When we look at the implementation of the output iterator that filters, its operator++ and operator* implementations are pretty ascetic (like for all output iterators):

template<typename Iterator, typename Predicate>
class output_filter_iterator
{
public:    
    explicit output_filter_iterator(Iterator iterator, Predicate predicate) : iterator_(iterator), predicate_(predicate) {}

    output_filter_iterator& operator++(){ ++iterator_; return *this; }
    output_filter_iterator& operator*(){ return *this; }

    template<typename T>
    output_filter_iterator& operator=(T const& value)
    {
        if (predicate_(value))
        {
            *iterator_ = value;
        }
        return *this;
    }
private:
    Iterator iterator_;
    Predicate predicate_;
};

No checking of the predicate, no reading from the underlying iterator.

Will this be enough to make them immune to the Terrible Problem?

Let’s run that code to find out.

Smart output iterators and the TPOIASI

Running the code with the same trace:

int times2(int n)
{
   std::cout << "transform " << n << '\n';
   return n * 2;
}

bool isMultipleOf4(int n)
{
    return n % 4 == 0;
}

gives this output:

transform 1
transform 2
transform 3
transform 4
transform 5
4 8

No multiple calls to the function!

Does that mean that smart output iterators are immune to the Terrible Problem?

It’s not that simple. The above case appends data to an empty vector, with the help of a back_inserter. But if we change the use case a little, by writing into the vector rather than appending to it:

// Input vector
std::vector<int> numbers = {1, 2, 3, 4, 5};

// Output vector
std::vector<int> results = {0, 0, 0, 0, 0};

//Apply transform and filter
auto oIsMultiple4 = make_output_filter(isMultiple4);
auto oTimes2 = make_output_transformer(times2);

copy(numbers, oTimes2(oIsMultiple4(begin(results))));

// Display results
for (auto result : results)
{
    std::cout << result << ' ';
}

We would expect this:

4 8 0 0 0

But the result we get is in fact that:

0 4 0 8 0

This is a bug. It comes from the operator++ that increments the underlying iterator even if the smart output iterator ends up not writing to it (in the case where the value it is passed doesn’t satisfy the predicate).

Let’s attempt to fix this by changing the implementation of operator++ from this:

    output_filter_iterator& operator++(){ ++iterator_; return *this; }

as it was above, to that:

    output_filter_iterator& operator++(){ return *this; }

By not incrementing the underlying iterator.

The the result we get is now this:

8 0 0 0 0

This is still not good, because we’re never incrementing the underlying iterator, therefore we’re constantly writing at the same position.

No, we’d need to increment the filter iterator only if is has sent something to its underlying iterator. Let’s just write it then:

template<typename Iterator, typename Predicate>
class output_filter_iterator
{
public:    
    explicit output_filter_iterator(Iterator iterator, Predicate predicate) : iterator_(iterator), predicate_(predicate) {}

    output_filter_iterator& operator++(){ return *this; }
    output_filter_iterator& operator*(){ return *this; }

    template<typename T>
    output_filter_iterator& operator=(T const& value)
    {
        if (predicate_(value))
        {
            *iterator_ = value;
            ++iterator_;
        }
        return *this;
    }
private:
    Iterator iterator_;
    Predicate predicate_;
};

Now when we run the code we get:

4 8 0 0 0

And does the case with back_inserter still work? Let’s run it:

4 8

It does still work.

It all looks good except there is a nagging question left:

Is this OK?

Implementing the operator++ by incrementing the underlying sounded natural. Indeed, imagine that an algorithm decided to increment the output iterator twice before assigning it. A std::vector iterator would skip an element, but our smart output iterator would completely be oblivious to that double-increment.

It turns out that it’s ok, because algorithms are not allowed to increment an output iterator twice without calling operator= in between. Indeed, as we can read on cppreference.com, “Assignment through an output iterator is expected to alternate with incrementing. Double-increment is undefined behavior”.

I may well miss something, but this makes this implementation look ok to me, and smart output iterators have avoided the TPOIASI, which looks like a good sign for their design.

If you’d like to see the code of the smart output iterators library, it’s up on GitHub.

You may also like

Don't want to miss out ? Follow:   twitterlinkedinrss
Share this post!Facebooktwitterlinkedin