Jonathan Boccara's blog

Algorithms on Sets That Return a Boolean: Implementing the Generic Algorithm

Published August 14, 2020 - 0 Comments

In the last post in our series on sets, we’ve uncovered 8 algorithms on sets that return a boolean, providing various sorts of comparisons between those two sets:

  • std::includes
  • share_element
  • is_prefix_of
  • is_one_prefix_of_other
  • equivalent
  • disjoint
  • is_before
  • is_after

We also saw that each of those algorithm corresponds to a combination of 4 customisation points in a generic algorithm:

template <typename SetA, typename SetB, typename Compare>
bool includes(SetA&& setA, SetB&& setB, Compare comp)
{
    auto xA = setA.begin();
    auto xB = setB.begin();
    while (xA != setA.end() && xB != setB.end())
    {
        if (comp(*xA, *xB))
        {
            1st customisation point
        }
        else if (comp(*xB, *xA))
        {
            2nd customisation point
        }
        else
        {
            3rd customisation point
        }
    }
    4th customisation point
}

For example, std::includes corresponds to this combination:

  • 1st customisation point: move on
  • 2nd customisation point: return false
  • 3rd customisation point: move on
  • 4th customisation point: reached the end of setB

You can read the last post to get up to speed on this topic.

Today, we’re attempting to write this generic algorithm that accepts customisation points! Let’s call this generic algorithm set_bool_information.

This article is part of our ever-growing series on sets:

  • How to check if 2 sorted collections have a common element
  • set_match: Matching up Elements Between Sorted Collections
  • Implementing set_match in One Line of Code
  • STL algorithms on sets: one algorithm to implement them all
  • Algorithms on set returning a boolean: exploring the algorithms
  • Algorithms on set returning a boolean: implementing a generic algorithm
  • Algorithms on set returning a boolean: a strong template interface
  • NWH: Adapting algoritms on sets

The challenges in implementing set_bool_information

There are at least two challenges in implementing set_bool_information.

The first one is that the various values of the customisation points don’t do things of the same nature: moving on requires to increment iterators, whereas return true requires to break the execution flow of the function and exit it.

This requires more than simple polymorphism.

The second challenge lies in its interface: we need to pass in 4 customisation points that look like each other (return true, return false, etc.). For each call site, we need somehow to clarify which behaviour we want to associate to each customisation points. Otherwise there is a risk of mixing up the order of arguments.

We will attempt to solves those two challenges successively: we will first write an implementation that does the job, and then solidify its interface by using strong templates to make sure we can read the code easily and don’t mix up the arguments by accident.

A first implementation of set_bool_implementation

Since the values of the customisation points are known at compile time, we’re going to use template parameters to represent them. The call site will pass in various types, and those types will hold  functions that implement the various customisation points.

Customisation points

Let’s start with the simplest value for a customisation point, return true.

To implement a function associated to a type, we can use static functions inside of that type. And since the function doesn’t carry more meaning that the type itself, we would like to avoid giving it a name. We can do that in C++, but we can give it the shortest name there is: an underscore:

struct ReturnTrue
{
    static bool _() { return true; }
};

We can define the other customisation points in a similar way:

struct ReturnFalse
{
    static bool _() { return false; }
};

struct ReturnHasReachedEndOfFirst
{
    template<typename Iterator1, typename End1, typename Iterator2, typename End2>
    static bool _(Iterator1&& xA, End1&& end1, Iterator2&&, End2&&)
    {
        return xA == end1;
    }
};

struct ReturnHasReachedEndOfSecond
{
    template<typename Iterator1, typename End1, typename Iterator2, typename End2>
    static bool _(Iterator1&&, End1&&, Iterator2&& xB, End2&& end2)
    {
        return xB == end2;
    }
};

struct ReturnHasReachedEndOfBoth
{
    template<typename Iterator1, typename End1, typename Iterator2, typename End2>
    static bool _(Iterator1&& xA, End1&& end1, Iterator2&& xB, End2&& end2)
    {
        return xA == end1 && xB == end2;
    }
};

For the customisation points of the end of the algorithm, we need to compare iterators, so the static function needs to accept them too.

But in fact, return true and return false can also be passed as customisation values for the end of the algorithm. They therefore also need to accept iterators, even if they don’t do anything with them. To handle those cases we add another overload of _ to them:

struct ReturnTrue
{
    static bool _() { return true; }

    template<typename Iterator1, typename End1, typename Iterator2, typename End2>
    static bool _(Iterator1&&, End1&&, Iterator2&&, End2&&)
    {
        return true;
    }
};

struct ReturnFalse
{
    static bool _() { return false; }

    template<typename Iterator1, typename End1, typename Iterator2, typename End2>
    static bool _(Iterator1&&, End1&&, Iterator2&&, End2&&)
    {
        return false;
    }
};

What about the customisation value that consists of moving on?

For the moment, let’s just implement it with no method. We’ll see if we need to add something to it as we go along:

struct MoveOn
{
};

The core of the algorithm

Now we need to flesh out this pseudo-code into real C++:

template <typename SetA, typename SetB, typename Compare>
bool includes(SetA&& setA, SetB&& setB, Compare comp)
{
    auto xA = setA.begin();
    auto xB = setB.begin();
    while (xA != setA.end() && xB != setB.end())
    {
        if (comp(*xA, *xB))
        {
            1st customisation point
        }
        else if (comp(*xB, *xA))
        {
            2nd customisation point
        }
        else
        {
            3rd customisation point
        }
    }
    4th customisation point
}

To do that, we can pass in the customisation values as extra parameters:

template<typename SetA, typename SetB, typename Compare,
         typename PredicateFirstLessThanSecond,
         typename PredicateSecondLessThanFirst,
         typename PredicateBothEquivalent,
         typename PredicateFinishedTraversal>
bool set_bool_information(SetA&& setA,
                          SetB&& setB,
                          Compare comp,
                          PredicateFirstLessThanSecond,
                          PredicateSecondLessThanFirst,
                          PredicateBothEquivalent,
                          PredicateFinishedTraversal)
{

And the customisation point can just invoke the static _ function of the given type and return the value. Except when the customisation is to move on. In this case, the algorithm needs to increment the iterators and not return anything:

template<typename SetA, typename SetB, typename Compare,
         typename PredicateFirstLessThanSecond,
         typename PredicateSecondLessThanFirst,
         typename PredicateBothEquivalent,
         typename PredicateFinishedTraversal>
bool set_bool_information(SetA&& setA,
                          SetB&& setB,
                          Compare comp,
                          PredicateFirstLessThanSecond,
                          PredicateSecondLessThanFirst,
                          PredicateBothEquivalent,
                          PredicateFinishedTraversal)
{
    auto xA = begin(setA);
    auto xB = begin(setB);
    
    while (xA != end(setA) && xB != end(setB))
    {
        if (comp(*xA, *xB))
        {
            if constexpr (std::is_same_v<PredicateFirstLessThanSecond, MoveOn>)
            {
                ++xA;
            }
            else
            {
                return PredicateFirstLessThanSecond::_();
            }
        }
        else if constexpr (comp(*xB, *xA))
        {
            if (std::is_same_v<PredicateSecondLessThanFirst, MoveOn>)
            {
                ++xB;
            }
            else
            {
                return PredicateSecondLessThanFirst::_();
            }
        }
        else
        {
            if constexpr (std::is_same_v<PredicateBothEquivalent, MoveOn>)
            {
                ++xA;
                ++xB;
            }
            else
            {
                return PredicateBothEquivalent::_();
            }
        }
    }
    return PredicateFinishedTraversal::_(xA, end(setA), xB, end(setB));
}

This code uses two C++17 features: if constexpr and std::is_same_v. But if you need this code to work in earlier versions of C++, it can be adapted easily.

First, if constexpr allows not to compile the else branch if the type is MoveOn. Since the else branch is the one that calls _, as a result we don’t have to implement _ for MoveOn.

With a regular C++98 if, the code would work but we’d need to implement _ for MoveOn even if it’s never called, because the else branch needs to compile even for MoveOn. In C++98, MoveOn can be implemented this way:

struct MoveOn
{
    static bool _() { assert(false); }
};

Second, std::is_same_v can be replaced by C++11’s std::is_same, but with a little adjustment:

if (std::is_same<PredicateFirstLessThanSecond, MoveOn>::value)
{

And std::is_same can even be replicated in C++98 without too much effort if necessary.

A stronger interface

Let’s look at the calling code to implement is_prefix_of:

template <typename Set1, typename Set2, typename Compare = std::less<>>
bool share_element(Set1&& set1, Set2&& set2, Compare comp = std::less<>{})
{
    return set_bool_information(setA,
                                setB,
                                comp,
                                MoveOn{},
                                MoveOn{},
                                ReturnTrue{},
                                ReturnFalse{});
}

comp is the function or function object to compare elements together. By default, this would be std::less.

This interface could be improved: in both cases we’re passing several types, but we don’t express what the correspond to.

The code would be more expressive if we could somehow say at call site: “When the first is less than the second, MoveOn“, “When the second is less than the first, ReturnFalse,” and so on.

This is what we’re going to see in the next post, by using strong templates. Stay tuned!

You will also like

Don't want to miss out ? Follow:   twitterlinkedinrss
Share this post!Facebooktwitterlinkedin