Chaining Output Iterators Into a Pipeline

Published August 6, 2019

We’ve been over a various set of smart output iterators over the past few weeks. Today we explore how to combine them and create expressive code.

If you’re just joining our series on smart output iterators, you might want to check out this introductory post on smart output iterators.

So far, we’ve been combining smart output iterators by using operator():

auto const isEven = filter([](int n){ return n % 2 == 0; });
auto const times2 = transform([](int n){ return n * 2; });

std::vector<int> results;
std::copy(begin(input), end(input), isEven(times2(times2(back_inserter(results)))));

The output iterators generated by filter and times2 have an operator() that accepts another iterator and sends results to it. That is to say that isEven sends to times2 only the elements of input that are even, and times2 sends on every number it gets in multiplied by 2 to another times2, which doubles those results again and send them to back_inserter, which sends them to the push_back method of results.

After executing this code, results contains {8, 16, 24, 32, 40}.

But combining output iterators this way by using operator() has several drawbacks:

it doesn’t reflect the fact that each one passes data on to the next one
the more iterators there are, the more parentheses build up (and this is C++, not LISP!)
it forces us to define the iterator outside of the statement they’re used in.

To illustrate this last drawback, consider what it would look like to define the output iterators where they’re used:

std::copy(begin(input), end(input), filter([](int n){ return n % 2 == 0; })(transform([](int n){ return n * 2; })(transform([](int n){ return n * 2; })(back_inserter(results)))));

Not really clear. This gets worse if the iterators belong to a namespace, which they should do if we use them in existing code:

std::copy(begin(input), end(input), output::filter([](int n){ return n % 2 == 0; })(output::transform([](int n){ return n * 2; })(output::transform([](int n){ return n * 2; })(back_inserter(results)))));

Even if we pile them up across several lines of code, the transitions between iterators are still unclear:

std::copy(begin(input), end(input), output::filter([](int n){ return n % 2 == 0; })
                                   (output::transform([](int n){ return n * 2; })
                                   (output::transform([](int n){ return n * 2; })
                                   (back_inserter(results)))));

We could declare the lambdas on separate, but the syntax remains confusing:

auto isEven = [](int n){ return n % 2 == 0; };
auto times2 = [](int n){ return n * 2; };

std::copy(begin(input), end(input), output::filter(isEven)(output::transform(times2)(output::transform(times2)(back_inserter(results)))));

Compare this with the equivalent code using range-v3:

inputs | ranges::view::filter(isEven) | ranges::view::transform(times2) | ranges::view::transform(times2);

This looks much nicer.

Let’s start by trying to use an operator to combine output iterators and, in a future post, get rid of std::copy and combine range adaptors and smart output iterators in the same expression.

`operator|` and left-associativity

Could we just use operator| to combine smart output iterators, like we do for combining ranges?

It turns out that we can’t, because operator| is left-associative.

What does “left-associative” mean?

If we look back at the expression using ranges, it was (by omitting namespaces for brevity):

inputs | filter(isEven) | transform(times2) | transform(times2)

This expression is ambiguous. operator| takes two parameters, and the three operator|s need to be executed successively. So there are multiple ways to do that:

calling operator| on the first two operands on the left, then calling operator| on the result of this operation and the third one, and so on. This is left-associative, and is equivalent to this:

(((inputs | filter(isEven)) | transform(times2)) | transform(times2))

calling operator| on the last two operands on the left, then calling operator| on the result of this operation and the second one, and so on. This is right-associative, and is equivalent to this:

(inputs | (filter(isEven) | (transform(times2) | transform(times2))))

calling the operator| in yet a different order, such as:

(inputs | filter(isEven)) | (transform(times2) | transform(times2))

The last example is neither left-associative nor right-associative.

Now that we’re clear on what left-associative means, let’s go back to operator|: operator| is left-associative. That is part of the C++ standard.

A right-associative operator

A left-associative operator makes sense for ranges, because ranges build up from left to right.

Indeed, inputs | filter(isEven) is a range of filtered elements. When we apply a transformation on those elements, we tack on a transform(times2) to this range of filtered elements. That’s why it makes sense to use a left-associative operator:

(((inputs | filter(isEven)) | transform(times2)) | transform(times2))

For output iterators, this is the opposite. If we use operator| to combine them, like this (namespaces again omitted for brevity):

filter(isEven) | transform(times2) | transform(times2) | back_inserter(results);

Then the left-associativity of operator| would dictate that the first operation to be executed in this expression would be:

filter(isEven) | transform(times2)

But contrary to input | filtered(isEven) that represents a filtered range, filter(isEven) | transform(times2) here with output iterators doesn’t represent anything. It doesn’t stand on its own.

What does represent something and stands on its own is the combination of the last two output iterators:

transform(times2) | back_inserter(results)

It represents an output iterator that applies times2 and send the result to the push_back method of results.

What we need then is a right-associative operator. What right-associative iterators are there in C++? Let’s look it up on cppreference.com, that provides this useful table:

C++ operators

As the latest column of this table indicates, the right-associative operators are on lines 3 and 16.

The operators on line 3 are unary (they only take one parameter), so we’re left with line 16. To me, the one that looks most natural for our purpose is operator>>=. If you think otherwise please leave a comment to express your opinion.

By using operator>>=, our combination of output iterators becomes:

filter(isEven) >>= transform(times2) >>= transform(times2) >>= back_inserter(results)

This leads to clearer code:

std::copy(begin(input), end(input), output::filter(isEven) >>= output::transform(times2) >>= output::transform(times2) >>= back_inserter(results));

We can also pile it up on several lines and/or use inline lambdas:

std::copy(begin(input), end(input),
          output::filter([](int n){ return n % 2 == 0; })
      >>= output::transform([](int n){ return n * 2; })
      >>= output::transform([](int n){ return n * 2; })
      >>= back_inserter(results));

Which is kind of like in the ranges style.

The actual implementation

All we’ve seen so far is just the interface. And I think this is what matters the most. Now that we’ve got this straightened out, we can work on the implementation.

In our case the implementation is quite straightforward, as it consists in defining an operator>>= that takes a helper that represents an output iterator (say output_transformer which is what transform returns, see the introductory post on smart output iterators or the actual code of transform to read more details about this) and any other output iterator and associate the two to create an output iterator:

template<typename TransformFunction, typename Iterator>
output_transform_iterator<std::tuple<TransformFunction>, Iterator> operator>>=(output_transformer<TransformFunction> const& outputTransformer, Iterator iterator)
{
    return outputTransformer(iterator);
}

Towards more powerful features and a nicer syntax

What would be nicer is to get rid of the call to std::copy, and just write the operations in the form of a pipeline. And what would be even nicer is to combine ranges and smart output iterators in the same expression, to benefit from their respective advantages and get the best of both worlds.

This is what we explore in the next post.

And if you see how to use operator| to combine smart output iterators instead of operator>>=, it would be great. Please leave a comment if you have an idea about how to do it.

You will also like

Don't want to miss out ? Follow:
Share this post!

About Jonathan Boccara