Jonathan Boccara's blog

How to Check If 2 Sorted Collections Have a Common Element

Published July 3, 2020 - 0 Comments

Ah, the algorithms on sets! Such beautiful algorithms, and so useful too.

The algorithms on sets are basically the algorithms that take sorted collections and compare them in linear time. The STL offers five algorithms on sets: std::set_difference, std::set_intersection, std::set_union, std::set_symmetric_difference, and std::includes.

If you’re a C++ developer, you absolutely, positively, unquestionably need to know your algorithms on sets.

You need to know the algorithms on sets of the STL, but it’s also beneficial to understand how they’re implemented. This lets us create new algorithms on sets.

Indeed, what the STL offers is a good start, but there are many more things we could do on sets to make our everyday coding tasks easier, and that is not in the STL.

In particular, if you’d like to know whether two given sorted collections have an element in common, you’re pretty much left stranded. You could perform a set::intersection and check whether the output is empty or not, but that sounds like a lot of unnecessary work.

To this end, let’s see how to implement share_element, an algorithm that takes two sorted collection and returns a boolean indicating whether they have an element in common.

Thanks to Fluent C++ subscriber Kai-Moritz Kumkar for bringing up the need for share_element!

This post is part of the series on sets:

A generic algorithm for comparing sets

What we call “sets” here is sorted collections. This includes std::sets, but also sorted std::vectors for example.

All algorithms that compare sets have the same type of implementation: iterate in set 1 while encountering elements that are smaller that the first one of set 2. Then iterate on set 2 while encountering elements that are smaller than the one we stopped at in set 1. Then iterate in set 1 again, and so on. And during those iterations, extract the information you need: for set_difference, that would be the elements are only in set 1 for example.

I made a video to illustrate this kind of algorithm, you can check it out here.

This algorithm takes advantage of the facts that the two collections are sorted, which gives it a linear complexity (size1 + size2). If the collections were not sorted we’d have to check the whole collection 2 for each element of the collection 1, which would give a quadratic complexity (size1 * size2).

Some time ago we saw a generic algorithm on sets: set_segregrate. set_segregrate takes two sorted collections, and outputs three: the elements that are only in the first sorted collection, the elements that are only in the second one and the elements that are in both:

set_segregate

To implement set_shared_element, we can get inspired from the code of set_segregate. Indeed, for share_element we’re interested in identifying if there is anything in what set_segregate would output in the “Both” result.

Here is the implementation of set_segregate. The line highlighted in blue is the one where the algorithm outputs results in “Both”:

template<class SetA, class SetB,
         class OutputOnlyA, class OutputBoth, class OutputOnlyB, class Compare, class AddToBoth>
void set_segregate_impl(SetA&& setA, SetB&& setB,
                        OutputOnlyA&& onlyA, OutputBoth&& both, OutputOnlyB&& onlyB,
                        Compare comp, AddToBoth addToBoth)
{
    auto xA = setA.begin();
    auto xB = setB.begin();
    while (xA != setA.end() && xB != setB.end())
    {
        if (comp(*xA, *xB))
        {
            *onlyA++ = *xA++;
        }
        else if (comp(*xB, *xA))
        {
            *onlyB++ = *xB++;
        }
        else
        {
            *both++ = addToBoth(*xA++, *xB++);
        }
    }
    std::copy(xA, end(setA), onlyA);
    std::copy(xB, end(setB), onlyB);
}

share_element

We can adapt this code for our purpose. Indeed, it does much more than what we need for share_element. We can trim it down by making it return a bool, replace the place where it fills the “Both” collection with a return true, and those where it didn’t find anything in common with return false:

We can then reorder this code to simplify it:

template<class SetA, class SetB, typename Compare>
bool share_element(SetA&& setA, SetB&& setB, Compare comp)
{
    auto xA = setA.begin();
    auto xB = setB.begin();
    while (xA != setA.end() && xB != setB.end())
    {
        if (comp(*xA, *xB))
        {
            ++xA;
        }
        else if (comp(*xB, *xA))
        {
            ++xB;
        }
        else
        {
            return true;
        }
    }
    return false;
}

That’s about it for the logic of the algorithm.

Comparing with operator< by default

In the above code we’ve used a generic comparator, defined by the template parameter Compare. But often there is a natural way to compare elements: using operator<. Like STL algorithms, let’s provide a second overload of share_element, that uses operator< for comparisons:

template<class LeftRange, class RightRange>
bool share_element(LeftRange const& leftRange, RightRange const& rightRange)
{
    return share_element(leftRange, rightRange, std::less<>{});
}

This overload relies on the magic of std::less<>.

Better than code inspiration, code reuse

Many algorithms on sets, including the STL’s set_difference, set_unionset_intersection and set_symmetric_difference can be implemented with set_segregate.

On the other hand, we didn’t implement share_element with set_segregate. We only got inspired by its code. Is there an even more generic algorithm than set_segregate, that both set_segregate and share_element could reuse for their implementation?

A first step in this direction is to have a generic algorithm that performs checks on sets, returning a boolean. Indeed, like share_elementstd::includes also returns a bool and is not implementable with set_segregate.

Maybe there is a counterpart to set_segregate for performing checks on collections, that std::includes and share_element could reuse in their implementations, and leading to new algorithms?

This is what we explore in future posts. In the meantime if you have an opinion on this, please let me know in the comments section. And if you’d like to contribute to the research on such topics, consider becoming a Patron of Fluent C++!

Stay tuned!

You will also like

Don't want to miss out ? Follow:   twitterlinkedinrss
Share this post!Facebooktwitterlinkedin