Jonathan Boccara's blog

STL Algorithms on Sets: One Algorithm to Implement Them All

Published July 24, 2020 - 0 Comments

The STL algorithms on sets are one of the most convenient things the C++ standard library offers. We’re going to see how they can all be implemented with the same core algorithm.

This article is part of our series on algorithms on sets, which now includes:

Algorithms that look like each other

The STL offers 4 algorithms on sets that look like each other. They all take two sorted collections, A and B, and:

  • std::set_difference outputs the elements that are in A and not in B,
  • std::set_intersection outputs the elements that are both in A and in B,
  • std::union output the elements that in A or in B,
  • std::set_symmetric_difference outputs the elements that are in A and not in B or in B and not in A (or said differently, in A xor in B).

They all benefit from the fact that A and B are sorted to operate in linear complexity (size of A + size of B). For more details about the algorithms on sets, check out this refresher first.

Even if they all do different things, they’re overall pretty similar. Couldn’t we write a core algorithm that they could all be implemented with?

That question has been in the back of my mind for a while. At one occurrence of Meeting C++ I had the chance to meet Sean Parent and to discuss this with him. Sean suggested that this could be done by associating a logical predicate to each algorithm: set_insersection is AND, set_union is OR, and so on.

Let’s write code to do that.

set_logical_operation

Let’s call our common algorithm set_logical_operation.

set_logical_operation takes two input collections and an output iterator. On top of that, set_logical_operation takes a logical predicate: a function that takes two bools and returns a bool.

Let’s write the expected call site first, as this generally allows to write simple code:

// equivalent to std::set_intersection
set_logical_operation(A, B, std::back_inserter(results), std::logical_and<int>{});

// equivalent to std::set_union
set_logical_operation(A, B, std::back_inserter(results), std::logical_or<int>{});

// equivalent to std::set_symmetric_difference (predicate is XOR)
set_logical_operation(A, B, std::back_inserter(results), [](bool inLeft, bool inRight){ return inLeft ^ inRight;});

// equivalent to std::set_difference
set_logical_operation(A, B, std::back_inserter(results), [](bool inLeft, bool inRight){ return inLeft && !inRight;});

Now that we’re clear on what its interface should look like, let’s move on to implementing set_logical_operation.

Implementing set_logical_operation

Here is the prototype of set_logical_operation:

template<typename SetA, typename SetB, typename OutputIterator, typename LogicalOperation>
OutputIterator set_logical_operation(SetA&& setA, SetB&& setB, OutputIterator&& out, LogicalOperation logicalOperation)
{

With the predicate passed to set_logical_operation, we can determine three things:

  • should we keep the elements that are in A and not in B?
  • should we keep the elements that are both in A and in B?
  • should we keep the elements that are in B and not in A?

To do this, we can invoke the predicate with the following respective calls:

  • logicalOperation(true, false)
  • logicalOperation(true, true)
  • logicalOperation(false, true)

Depending on those values, we want various parts of the outputs of set_segregate. set_segregate is a non-standard algorithm on sets that takes two sorted collections A and B, and three output iterators to which it sends respectively:

  • the elements that are in A and not in B,
  • the elements that are both in A and in B,
  • the elements that are in B and not in A.

Its prototype is:

template<class SetA, class SetB,
         class OutputOnlyA, class OutputBoth, class OutputOnlyB>
void set_segregate(Set1&& setA, Set2&& setB,
                   OutputItLeft&& onlyA, OutputItBoth&& both, OutputItRight&& onlyB);

We can implement set_logical_operation by calling set_segregate.

Discarding outputs

The challenging aspect of doing that is to ignore the outputs of set_segregate that we’re not interested in.

To do that we can use the dev_null.

The dev_null is an non-standard output iterator available in the pipes library that ignores the value it receives. Its implementation is this:

struct dev_null
{
    using iterator_category = std::output_iterator_tag;
    using value_type = void;
    using difference_type = void;
    using pointer = void;
    using reference = void;

    dev_null& operator*(){ return *this; }
    dev_null& operator++(){ return *this; }
    
    template<typename T>
    dev_null& operator=(T&&){ return *this; }
};

So we need to pass out to the outputs of set_segregate that we want to keep, and dev_null to those we want to discard.

A simple way to do this is to go over all the possibilities for the values of the logical operation:

template<typename SetA, typename SetB, typename OutputIterator, typename LogicalOperation>
OutputIterator set_logical_operation(SetA&& setA, SetB&& setB, OutputIterator&& out, LogicalOperation logicalOperation)
{
    auto const includeElementsInAOnly = logicalOperation(true, false);
    auto const includeElementsInBOnly = logicalOperation(false, true);
    auto const includeElementsInBoth = logicalOperation(true, true);
    
    if (includeElementsInAOnly && includeElementsInBoth && includeElementsInBOnly)
    {
        set_segregate(setA, setB, out, out, out);
    }
    else if (includeElementsInAOnly && includeElementsInBoth && !includeElementsInBOnly)
    {
        set_segregate(setA, setB, out, out, dev_null{});
    }
    else if (includeElementsInAOnly && !includeElementsInBoth && includeElementsInBOnly)
    {
        set_segregate(setA, setB, out, dev_null{}, out);
    }
    else if (includeElementsInAOnly && !includeElementsInBoth && !includeElementsInBOnly)
    {
        set_segregate(setA, setB, out, dev_null{}, dev_null{});
    }
    else if (!includeElementsInAOnly && includeElementsInBoth && includeElementsInBOnly)
    {
        set_segregate(setA, setB, dev_null{}, out, out);
    }
    else if (!includeElementsInAOnly && includeElementsInBoth && !includeElementsInBOnly)
    {
        set_segregate(setA, setB, dev_null{}, out, dev_null{});
    }
    else if (!includeElementsInAOnly && !includeElementsInBoth && includeElementsInBOnly)
    {
        set_segregate(setA, setB, dev_null{}, dev_null{}, out);
    }
    else if (!includeElementsInAOnly && !includeElementsInBoth && !includeElementsInBOnly)
    {
        set_segregate(setA, setB, dev_null{}, dev_null{}, dev_null{});
    }
    return out;
}

This implementation does the job. However, it seems that we’re repeating a lot of code, and that we could refactor that into more straightforward code.

Simplifying the code with if constexpr

What makes the code challenging is that out and dev_null are of two different types. So we can’t write code like:

if (includeElementsInAOnly)
{
    outputIterator = out;
}
else
{
    outputIterator = dev_null{};
}

But by using C++17’s if constexpr, we can write a function that returns the correct type to use. That function won’t always have the same type, but this is one of the things that if constexpr allows:

template<bool shouldMakeOutputIterator, typename OutputIterator>
decltype(auto) makeOutputIteratorOrDevnull(OutputIterator&& out)
{
    if constexpr (shouldMakeOutputIterator)
    {
        return std::forward<OutputIterator>(out);
    }
    else
    {
        return dev_null{};
    }
}

Depending on the boolean template parameter, this function will either return the output iterator it takes as a parameter, or a dev_null.

If you’re not familiar with if constexpr and the other good things that C++17 provides, get up to speed with Bartek’s book C++17 in detail.

Note that FWD is a non-standard macro to shorten the call to std::forward (thanks Vittorio Romeo):

#define FWD(value) std::forward<decltype(value)>(value)

We can now use our function to implement set_logical_operation:

template<typename SetA, typename SetB, typename OutputIterator, typename LogicalOperation>
OutputIterator set_logical_operation(SetA&& setA, SetB&& setB, OutputIterator out, LogicalOperation logicalOperation)
{
    auto constexpr includeElementsInAOnly = logicalOperation(true, false);
    auto constexpr includeElementsInBOnly = logicalOperation(false, true);
    auto constexpr includeElementsInBoth = logicalOperation(true, true);

    auto outputAOnly = makeOutputIteratorOrDevnull<includeElementsInAOnly>(FWD(out));
    auto outputBOnly = makeOutputIteratorOrDevnull<includeElementsInBOnly>(FWD(out));
    auto outputBoth = makeOutputIteratorOrDevnull<includeElementsInBoth>(FWD(out));
    
    set_segregate(setA, setB, outputAOnly, outputBoth, outputBOnly);
    
    return out;
}

However, this code end s up calling the constructor of the output iterator up to three times, to construct outputAOnly, outputBoth and outputBOnly.

It will be a move constructor if there is one. But if the output iterator has no move constructor (and Effective Modern C++ recommends in item 29 that we don’t count on move operations in generic code), then they will make copies. If the iterators are begin or back_inserter that’s not too bad, but if they are pipes with large data as context, that may be not desirable.

We can avoid all this by passing the results of the function directly to set_seggregate:

template<typename SetA, typename SetB, typename OutputIterator, typename LogicalOperation>
OutputIterator set_logical_operation(SetA&& setA, SetB&& setB, OutputIterator&& out, LogicalOperation logicalOperation)
{
    auto constexpr includeElementsInAOnly = logicalOperation(true, false);
    auto constexpr includeElementsInBOnly = logicalOperation(false, true);
    auto constexpr includeElementsInBoth = logicalOperation(true, true);

    set_segregate(setA, setB,
                  makeOutputIteratorOrDevnull<includeElementsInAOnly>(std::forward<OutputIterator>(out)),
                  makeOutputIteratorOrDevnull<includeElementsInBoth>(std::forward<OutputIterator>(out)),
                  makeOutputIteratorOrDevnull<includeElementsInBOnly>(std::forward<OutputIterator>(out)));
    
    return out;
}

One algorithm to rule them all?

With set_logical_operation, we now have a core algorithm that allows to implement the following STL algorithms:

  • std::set_difference,
  • std::set_symmetric_difference,
  • std::set_intersection,
  • std::set_union.

But there is another algorithm on sets that the STL offers: std::includes. std::includes takes two sets A and B and returns a boolean, indicating whether all the elements of B are also in A.

Our new set_logical_operation doesn’t allow to implement std::includes. std::includes belongs to another family of algorithms on sets: the algorithms that compare two sets and return a boolean.

This family of algorithms is what we tackle next in our series on algorithms on sets. Stay tuned!

Don't want to miss out ? Follow:   twitterlinkedinrss
Share this post!Facebooktwitterlinkedin