Jonathan Boccara's blog

Implementing set_match in One Line of Code

Published July 17, 2020 - 0 Comments

In the previous post we’ve implemented set_match, an algorithm on sets inspired from the STL ones, that pairs up matching elements between two sorted collections.

Being an algorithm on sets, the code we wrote for it looks like a typical implementation of an algorithm on set:

template<typename Set1, typename Set2, typename OutputIterator, typename Comparator>
OutputIterator set_match(Set1&& set1, Set2&& set2, OutputIterator out, Comparator comp)
{
    auto it1 = begin(set1);
    auto it2 = begin(set2);
    
    while (it1 != end(set1) && it2 != end(set2))
    {
        if (comp(*it1, *it2))
        {
            ++it1;
        }
        else if (comp(*it2, *it1))
        {
            ++it2;
        }
        else
        {
            *out = std::make_pair(*it1, *it2);
            ++out;
            ++it1;
            ++it2;
        }
    }
    return out;
}

But since it’s a typical implementation can we rather reuse the code of existing algorithms on sets to implement set_match? Is there a generic code that algorithms on sets can be implemented with?

By reusing other algorithms and libraries, we’re going to implement set_match in one line of code.

This post is part of our growing series on sets:

Refresher on set_match

Here is a brief recap on set_match. If you feel already fresh with the algorithm you can skip to the next section.

The goal of set_match is to identify and pair up equivalent elements between two “sets”, that are sorted collections. For example, with those two maps:

std::map<int, char> input1 = {{1,'1'}, {2,'2'}, {3,'3'}, {5,'5'}, {7,'7'}, {8, '8'}};
std::map<int, std::string> input2 = {{2,"two"}, {3,"three"}, {4,"four"}, {5,"five"}, {7,"seven"}, {11,"eleven"}};

We can call set_match this way:

auto results = std::vector<std::pair<std::pair<int, char>, std::pair<int, std::string>>>{};

set_match(input1, input2, back_inserter(results), NumberCharStringCompare{});

NumberCharStringCompare is a function object that compares maps keys:

struct NumberCharStringCompare
{
    bool operator()(std::pair<int const, char> const& numberWithChar, std::pair<int const, std::string> const& numberWithString)
    {
        return numberWithChar.first < numberWithString.first;
    }
    bool operator()(std::pair<int const, std::string> const& numberWithString, std::pair<int const, char> const& numberWithChar)
    {
        return numberWithString.first < numberWithChar.first;
    }
};

Then the result of calling set_match fills results as if it was initialised this way:

std::vector<std::pair<std::pair<int, char>, std::pair<int, std::string>>> results =
  { { {2,'2'}, {2,"two"}   },
    { {3,'3'}, {3,"three"} },
    { {5,'5'}, {5,"five"}  },
    { {7,'7'}, {7,"seven"} } };

For more details about set_match and the logic behind its implementation, you can check out the detailed article on set_match.

set_segregate: a general algorithm on sets

A while back we built set_segregate, a generalisation of the STL algorithms on sets.

The STL lets you compare sets by determining what elements they have in common and what elements they don’t. For example, std::set_difference takes two sets A and B and produces the elements that are in A but not in B.

set_segregate goes further, by giving you everything at the same time:

  • the elements that are in A but not in B,
  • the elements that in both in A and in B,
  • and the elements that in B but not in A.

It has three output iterators:

template<class Set1, class Set2, class OutputOnly1, class OutputBoth, class OutputOnly2>
void set_segregate(Set1&& set1, Set2&& set2,
                   OutputOnly1 only1, OutputBoth both, OutputOnly2 only2);

For set_match, we would be interested in the second output set, the elements that are both in A and in B.

We need them in the form of a pair, and set_segregate is able to do that. set_segregate detects the underlying type of the output iterator and, if this underlying type happens to be a pair containing the underlying type of set A and the underlying type of set B, it produces pairs as outputs. That’s what we need here.

If you’d like to read more about set_segregate, you can check out the whole story of set_segregate.

To be able to use set_segregate to implement set_match, we only need to discard the first and third outputs of set_segregate.

One naive way to do this would be by filling containers that we don’t use:

template<typename Set1, typename Set2, typename OutputIterator, typename Comparator>
OutputIterator set_match(Set1&& set1, Set2&& set2, OutputIterator out, Comparator comparator)
{
    auto unused1 = std::vector<typename std::remove_reference_t<Set1>::value_type>{};
    auto unused2 = std::vector<typename std::remove_reference_t<Set2>::value_type>{};
    set_segregate(std::forward<Set1>(set1), std::forward<Set2>(set2), back_inserter(unused1), out, back_inserter(unused2), comparator);
    return out;
}

But this is waste of execution time because it makes copies, a waste of memory to hold those copies, and a burden for code readability.

How can we write code that goes to the point, by just discarding the data we don’t need?

Breaking in the output iterator

set_segregate, like STL algorithms, produce its results to output iterators. The STL provide various output iterators, such as back_inserter that push_back elements to a std::vector, or begin that overrides the contents of already filled collection.

But nothing prevents us from writing our own output iterators, and that’s what the pipes library does.

Here we’re going to use the dumbest of the smart output iterators: dev_null, that ignores the value it receives.

The implementation of dev_null is the following:

struct dev_null
{
    using iterator_category = std::output_iterator_tag;
    using value_type = void;
    using difference_type = void;
    using pointer = void;
    using reference = void;

    dev_null& operator*(){ return *this; }
    dev_null& operator++(){ return *this; }
    
    template<typename T>
    dev_null& operator=(T&&){ return *this; }
};

The 5 first aliases are necessary to define an iterator, and they are used by STL algorithms.

The algorithms of the STL, as well as set_segregate, send data to their output iterator like this:

*out = value;
++out;

Or sometimes it’s shortened to this:

*out++ = value;

Although I find the first version more readable.

Either way, we can understand this syntax by imagining that out is the begin of a std::vector. In that case:

  • *out is a reference to the first element of the vector,
  • *out = value writes over this first element,
  • ++out moves the iterator on to the next element.

dev_null offers operators that are compatible with that syntax, but that do nothing. And to make operator= also do nothing, operator* returns a reference to dev_null itself, so that *out = value calls the operator= of dev_null, which does nothing.

Muting set_segregate with dev_null

Now we can use dev_null to discard the outputs of set_segregate that we’re not interested in:

template<typename Set1, typename Set2, typename OutputIterator, typename Comparator>
OutputIterator set_match(Set1&& set1, Set2&& set2, OutputIterator out, Comparator comparator)
{
    set_segregate(std::forward<Set1>(set1), std::forward<Set2>(set2), dev_null{}, out, dev_null{}, comparator);
    return out;
}

Even if the algorithm is passing data to dev_null, there is no copy involved since dev_null takes data by reference.

Now the implementation of set_match is down to one meaningful line of code (not counting the line with return out).

An algorithm to rule them all?

When you think about it, there is another algorithm that looks a lot like set_match: it’s the standard algorithm std::set_intersection. It does everything like set_match except that, instead of returning pairs of matching elements, it returns the value coming from the first set.

The implementation of set_intersection must be very close to the one of set_match. Can we share some code between set_match and set_intersection? What about the other STL algorithms on sets?

It turns out we can implement a bunch of STL algorithms on sets with a common algorithm. This is what we see in the next post of our series on sets. Stay tuned!

You will also like

Don't want to miss out ? Follow:   twitterlinkedinrss
Share this post!Facebooktwitterlinkedin