Jonathan Boccara's blog

Partitioning Data with Output Iterators in C++

Published March 1, 2019 - 0 Comments

A couple of months (or years?) back, we saw that partitioning in the STL meant tidying up data according to a predicate: all that satisfy the predicate in one group, and all that don’t satisfy the predicate in another group:

partition STL

This is what the STL algorithms std::partition (or std::stable_partition to keep the relative order of elements) do:

auto numbers = std::vector<int>{1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
std::stable_partition(begin(numbers), end(numbers), [](int n){ return n % 2 == 0; });

for (auto const& number : numbers)
    std::cout << number << ' ';

The above program outputs:

2 4 6 8 10 1 3 5 7 9

All the elements satisfying the predicate are first, the others after.

But there is another way to perform a partition with the STL: putting the values in separate collections. One collection for the elements that satisfy the predicate, and another one for the elements that don’t:

auto const numbers = std::vector<int>{1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
auto evenNumbers = std::vector<int>{};
auto oddNumbers = std::vector<int>{};

std::partition_copy(begin(numbers), end(numbers), back_inserter(evenNumbers), back_inserter(oddNumbers), [](int n){ return n % 2 == 0; });

std::cout << "Even numbers:\n";
for (auto const& number : evenNumbers)
    std::cout << number << ' ';

std::cout << "\nOdd numbers:\n";
for (auto const& number : oddNumbers)
    std::cout << number << ' ';

Note that numbers is now const, since the operation is no longer in place. The outputs are in evenNumbers and oddNumbers and the above code outputs:

Even numbers:
2 4 6 8 10 
Odd numbers:
1 3 5 7 9

Let’s now move that logic out of the algorithm and into the output iterator.

Why a smart output iterator

Before getting into the implementation of an output iterator that performs the equivalent of std::partition_copy, why would we want to do such a thing in the first place?

For two reasons:

  • breaking off the flow of operations on a collection into two branches,
  • chaining additional operations in either or both of those two branches.

To my knowledge, we can’t do this with C++ standard components, including with ranges that are coming up in C++20.

Indeed, ranges allow to chain operations, as long as they follow a linear flow:

numbers | ranges::view::transform(f) | ranges::view::filter(p);

Or they can apply operations that make the data converge, that is to say if several sources of data contribute to one result:

ranges::view::set_difference(numbers, otherNumbers) | ranges::view::transform(f);

But ranges can’t make the data flow diverge, or break off into several directions. This is a key difference between ranges and smart output iterators. They can complete each other, like we’ll see in a future post.

We’ve already seen some smart output iterators, such as transform and filter:

auto const times2 = transform([](int i) { return i*2; });

std::copy(begin(numbers), end(numbers), times2(back_inserter(results));

Or, as we’ll see in a future post, we can have a nicer syntax:

ranges::copy(numbers, transform([](int n){return n*2;}) >>= back_inserter(results));

Or something even nicer by hiding the call to copy.

If you hadn’t heard of smart output iterators before, you may want to check out this introductory post on smart output iterators, or check out the library on Github.

The partition iterator

Now that we’ve seen the rationale for implementing a partition output iterator, let’s decide what we’d like its usage to look like (proceeding this way makes code more expressive):

auto const isEvenPartition = partition([](int n){ return n % 2 == 0; });
    
std::copy(begin(input), end(input), isEvenPartition(back_inserter(evenNumbers), back_inserter(oddNumbers)));

To do this, we’ll follow our model for implementing smart output iterators, inspired from one of the most basic smart output iterators out there, the standard back_inserter.

We start by implementing operator*, that does nothing but returning itself, in order to keep control on the operator= that the STL algorithm will typically call afterwards:

output_partition_iterator& operator*(){ return *this; }

Same thing for operator++, not much to do:

output_partition_iterator& operator++(){ return *this; }
output_partition_iterator& operator++(int){ ++*this; return *this; }

The logic happens in operator=. operator= receives a value, and needs to send it to either one of the underlying iterators, according to whether or not it satisfies the predicate.

What follows of the previous sentence is that the iterator has to have access to both its underlying iterators, and to the predicate. We can store them as members in the class, and initialize them in the constructor. The concerned part of the class definition would then look like this:

output_partition_iterator(IteratorTrue iteratorTrue, IteratorFalse iteratorFalse, Predicate predicate)
    : iteratorTrue_(iteratorTrue)
    , iteratorFalse_(iteratorFalse)
    , predicate_(predicate) {}

private:
    IteratorTrue iteratorTrue_;
    IteratorFalse iteratorFalse_;
    Predicate predicate_;

Finally, we can implement the operator=:

output_partition_iterator& operator=(T const& value)
{
    if (predicate_(value))
    {
        *iteratorTrue_ = value;
        ++iteratorTrue_;
    }
    else
    {
        *iteratorFalse_ = value;
        ++iteratorFalse_;
    }
    return *this;
}

Matching the desired usage

Remember the desired usage: we wanted to construct the iterator in two phases. First, a function partition, that constructed a intermediate object:

auto const isEvenPartition = partition([](int n){ return n % 2 == 0; });

Then we’d use this object to take the underlying iterators and create the smart iterator we designed above:

isEvenPartition(back_inserter(evenNumbers), back_inserter(oddNumbers))

We therefore need an intermediary type that takes the predicate in its constructor, and has an operator() taking the two underlying iterators to send data to, and returning the output_parititon_iterator that we designed.

Let’s call this type output_partitioner:

template<typename Predicate>
class output_partitioner
{
public:
    explicit output_partitioner(Predicate predicate) : predicate_(predicate) {}
    template<typename IteratorTrue, typename IteratorFalse>
    output_partition_iterator<IteratorTrue, IteratorFalse, Predicate> operator()(IteratorTrue iteratorTrue, IteratorFalse iteratorFalse) const
    {
        return output_partition_iterator<IteratorTrue, IteratorFalse, Predicate>(iteratorTrue, iteratorFalse, predicate_);
    }
    
private:
    Predicate predicate_;
};

The partition function now just builds an output_partitioner (in C++17 with template type deduction in constructors, partition could have been the object we called output_partitioner):

template<typename Predicate>
output_partitioner<Predicate> partition(Predicate predicate)
{
    return output_partitioner<Predicate>(predicate);
}

Et voilà le travail!

The whole code is up on Github.

Now we can use partition to route the output of an algorithm into two branches, and combine this with other output iterators:

auto const isEvenPartition = partition([](int n){ return n % 2 == 0; });
auto const times2 = transform([](int n) { return n*2; });
auto const moreThan3 = filter([](int n) { return n>3; });

ranges::set_difference(input1, input2,
                       isEvenPartition(times2(back_inserter(output1)),
                                       moreThan3(back_inserter(output2)));

This code expresses a lot in a few lines, compared to what the version with STL algorithms or for loops would have looked like.

More than two outputs

Our partition iterator can split data into two branches according to a predicate. But what if we’d like to split into more than two? What would the interface look like? And the implementation?

This is what we explore in a future post, with the demultiplexer output iterator. But before that we’ll need some pre-requisite, including being able to apply STL-like algorithms on std::tuple.

Also, I don’t find the name “Smart output iterator” very catchy. Can you think of a better name for the library? Ouputors, maybe? Or another name? Please leave a comment with your suggestion!

You will also like

Don't want to miss out ? Follow:   twitterlinkedinrss
Share this post!Facebooktwitterlinkedin