Jonathan Boccara's blog

Algorithms on Sets That Return a Boolean: Exploring the algorithms

Published July 31, 2020 - 0 Comments

In a previous article on sets we’ve designed share_element, an algorithm on sets (sorted collections) that returns a boolean indicating whether they have a element in common, and that operates in linear time.

On the other hand, the STL also offers an algorithm on sets that return a boolean: std::includes. std::includes takes two sets and returns a boolean indicating whether the first one contains the elements of the second one. It also operates in linear time.

By looking at what share_element and std::includes have in common, we will uncover other interesting algorithms that compare sets together and return a boolean.

This post is part of the series on algorithms on sets:

share_element and std::includes: a starting point

Let’s look at our implementation of share_element:

template<class SetA, class SetB, typename Compare>
bool share_element(SetA&& setA, SetB&& setB, Compare comp)
{
    auto xA = setA.begin();
    auto xB = setB.begin();
    while (xA != setA.end() && xB != setB.end())
    {
        if (comp(*xA, *xB))
        {
            ++xA;
        }
        else if (comp(*xB, *xA))
        {
            ++xB;
        }
        else
        {
            return true;
        }
    }
    return false;
}

Now let’s look at an implementation of the std::includes STL algorithm:

template <typename SetA, typename SetB, typename Compare>
bool includes(SetA&& setA, SetB&& setB, Compare comp)
{
    auto xA = setA.begin();
    auto xB = setB.begin();
    while (xA != setA.end() && xB != setB.end())
    {
        if (comp(*xA, *xB))
        {
            ++xA;
        }
        else if (comp(*xB, *xA))
        {
            return false;
        }
        else
        {
            ++xA;
            ++xB;
        }
    }
    return xB == setB.end();
}

We can see that they have the same structure. They only differ at a few places, where they return different booleans.

If we generalize this structure, an algorithm on sets that returns a boolean has 4 customisation points:

template <typename SetA, typename SetB, typename Compare>
bool includes(SetA&& setA, SetB&& setB, Compare comp)
{
    auto xA = setA.begin();
    auto xB = setB.begin();
    while (xA != setA.end() && xB != setB.end())
    {
        if (comp(*xA, *xB))
        {
            1st customisation point
        }
        else if (comp(*xB, *xA))
        {
            2nd customisation point
        }
        else
        {
            3rd customisation point
        }
    }
    4th customisation point
}

On the first 3 customisation points, the algorithm can either return a boolean or move on by incrementing the iterators. On the 4th one, it has to return a boolean.

A combination of possibilities

Put another way, here is the list of possibilities for each customisation point:

  • 1st customisation point:
    • return true
    • return false
    • move on (++xA)
  • 2nd customisation point:
    • return true
    • return false
    • move on (++xB)
  • 3rd customisation point:
    • return true
    • return false
    • move on (++xA; ++xB;)
  • 4th customisation point:
    • return true
    • return false
    • the end of setA is reached (xA == setA.end())
    • the end of setB is reached (xB == setB.end())
    • the end of both is reached (xA == setA.end() && xB == setB.end())

This makes a total of 3×3×3×5 = 135 possible algorithms!

std::includes and share_element are just two of them.

share_element corresponds to this combination:

  • 1st customisation point: move on
  • 2nd customisation point: move on
  • 3rd customisation point: return true
  • 4th customisation point: return false

And std::includes corresponds to this combination:

  • 1st customisation point: move on
  • 2nd customisation point: return false
  • 3rd customisation point: move on
  • 4th customisation point: reached the end of setB

All this brings an obvious question: What are the 133 other algorithms?

Exploring the combinations

133 is a large number of algorithms. But it turns out that we can prune off some of them because they mean something that is not useful or because they don’t mean anything at all.

What’s left after pruning off the combinations are a handful of algorithm nuggets!

Before getting to the nuggets, let’s see how some combinations are not worth retaining.

Combinations that mean something not interesting

Let’s see an example of an algorithm that means something, but that is not useful.

Take the following combination:

  • 1st customisation point: move on,
  • 2nd customisation point: move on,
  • 3rd customisation point: move on
  • 4th customisation point: reached the end of setA

Its code looks like that:

template <typename SetA, typename SetB, typename Compare>
bool myAlgorithm(SetA&& setA, SetB&& setB, Compare comp)
{
    auto xA = setA.begin();
    auto xB = setB.begin();
    while (xA != setA.end() && xB != setB.end())
    {
        if (comp(*xA, *xB))
        {
            ++xA;
        }
        else if (comp(*xB, *xA))
        {
            ++xB;
        }
        else
        {
            ++xA;
            ++xB;
        }
    }
    return xA == setA.end();
}

This algorithms traverses the two sets until it reaches the end of one of them. When it does, it returns a boolean indicating if it reached the end of setA.

This means that this algorithm indicates whether the size of setA is less or equal than the size of setB. In general, this is something we can get in less than linear time. For example, if we’re using std::sets, we can just call their .size() methods and compare them.

So there is little point to the algorithm coming out of this particular combination.

Combinations that don’t mean anything

Now the we’ve seen an algorithm that means something useless, let’s see an example of a combination that results in an algorithm that doesn’t mean anything.

Or I should rather say, an algorithm where I didn’t see any meaning.

Consider the following combination:

  • 1st customisation point: move on,
  • 2nd customisation point: return false,
  • 3rd customisation point: return true,
  • 4th customisation point: reached the end of setA.

Let’s see the corresponding code:

template <typename SetA, typename SetB, typename Compare>
bool myAlgorithm(SetA&& setA, SetB&& setB, Compare comp)
{
    auto xA = setA.begin();
    auto xB = setB.begin();
    while (xA != setA.end() && xB != setB.end())
    {
        if (comp(*xA, *xB))
        {
            ++xA;
        }
        else if (comp(*xB, *xA))
        {
            return false;
        }
        else
        {
            return true;
        }
    }
    return xA == setA.end();
}

This algorithms does something, and I don’t know about you but I can’t see any meaning in it.

Basically every algorithm that has a return true and a return false in the first three customisation points doesn’t have any meaning in my opinion. Indeed, you don’t know from the call site whether the algorithm has reached the end of any of the sets before returning.

That said, I examined each of the 135 combinations, and I could well have overlooked the meaning of some algorithms and discard them too quickly. If you see an algorithm with useful meaning that isn’t listed in the nuggets that follow, please share your discovery in a comment!

The nuggets

Here are 6 combinations that have meaning and are useful.

Determining whether the first set is a prefix of the second one

The useful combination:

  • 1st customisation point: return false,
  • 2nd customisation point: return false,
  • 3rd customisation point: move on,
  • 4th customisation point: reached the end of setA.

Explanation: The algorithm traverses the two sets in lockstep, until it reaches one element which is not in common between the two (it then returns false), or the end of setA (it returns true).

We can call this algorithm is_prefix_of.

Determining whether either set is a prefix of the other

The useful combination:

  • 1st customisation point: return false,
  • 2nd customisation point: return false,
  • 3rd customisation point: move on,
  • 4th customisation point: return true.

Explanation: The algorithm traverses the two sets in lockstep, until it reaches one element which is not in common between the two (it then returns false), or the end of any of the two sets (it returns true).

Note that we could achieve the same result by calling is_prefix_of twice and swapping the arguments, but this would result in traversing the set twice.

We can call this new algorithm is_one_prefix_of_other.

Determining whether two sets have the same elements

The useful combination:

  • 1st customisation point: return false,
  • 2nd customisation point: return false,
  • 3rd customisation point: move on,
  • 4th customisation point: reached the end of both.

Explanation: The algorithm traverses the two sets in lockstep, until it reaches one element which is not in common between the two (it then returns false), or the end of both sets (it returns true).

It is in the same spirit as std::equal, but note that strictly speaking we can’t use std::equal with sets, because std::equal uses operator== and sorted collections are only required to have operator<. Read more about equality and equivalence here.

We can call this algorithm equivalent.

Determining whether two sets have no element in common

The useful combination:

  • 1st customisation point: move on,
  • 2nd customisation point: move on,
  • 3rd customisation point: return false,
  • 4th customisation point: return true.

Explanation: The algorithm traverses the two sets in lockstep, until it reaches one element which is in common between the two (it then returns false), or the end of any set (it returns true). Since the sets are sorted, the remaining part of the other set has elements that are greater than those examined, so not in common.

We can call this algorithm disjoint.

Note that disjoint is also the negation of share_element.

Determining whether all the elements of the first set are smaller than the smallest of the second one

The useful combination:

  • 1st customisation point: move on,
  • 2nd customisation point: return false,
  • 3rd customisation point: return false,
  • 4th customisation point: return true.

Explanation: The algorithm traverses the two sets in lockstep, until it reaches one element which is in common between the two (it then returns false), or an element of the second set that would be smaller than one of the first set (it also returns false). If it reaches the end of any set and that didn’t happen, it returns true.

We can call this algorithm is_before.

Determining whether all the elements of the second set are smaller than the smallest of the first one

The useful combination:

  • 1st customisation point: return false,
  • 2nd customisation point: move on,
  • 3rd customisation point: return false,
  • 4th customisation point: return true.

Explanation: The algorithm traverses the two sets in lockstep, until it reaches one element which is in common between the two (it then returns false), or an element of the first set that would be smaller than one of the second set (it also returns false). If it reaches the end of any set and that didn’t happen, it returns true.

We can call this algorithm is_after.

Note that is_after is not the negation of is_before, because two sets with intertwined elements would return false for both algorithms.

But is_after is equivalent to swapping the elements of is_before. However, it’s useful to offer the possibility to write both, the same way that we have operator< and operator> in C++, so that we can choose for each given call site which one is the most expressive.

In fact, is_after is almost equivalent to swapping the elements of is_before. But as we’ll see in a future post, there is a subtlety that prevents us from implementing it this way anyway.

A common algorithm to implement all that

In summary, we have 8 interesting algorithms on sets that return a boolean:

  • std::includes
  • share_element
  • is_prefix_of
  • is_one_prefix_of_other
  • equivalent
  • disjoint
  • is_before
  • is_after

Would it be possible to write a common algorithm that takes the combination of the 4 customisation points and returns a boolean?

This is what we see in a next blog post. Stay tuned!

You will also like

Don't want to miss out ? Follow:   twitterlinkedinrss
Share this post!Facebooktwitterlinkedin