set_match: Matching up Elements Between Sorted Collections

Published July 10, 2020

The STL offers a handful of algorithms on sets. They are important to master, but they’re only the tip of the iceberg of what we can do with sets.

In this post and the next few ones, we’re going to get deeper into the topic of algorithms on sets, by extending the algorithms on sets that the STL already offers.

Here are the topics of our series of algorithms on sets so far:

How to check if 2 sorted collections have a common element
set_match: Matching up Elements Between Sorted Collections
Implementing set_match in One Line of Code
STL algorithms on sets: one algorithm to implement them all
Algorithms on set returning a boolean: exploring the algorithms
Algorithms on set returning a boolean: implementing a generic algorithm
Algorithms on set returning a boolean: a strong template interface
NWH: Adapting algoritms on sets

Today we look at how to match up equivalent elements between sets.

The problem

In C++, we call “sets” sorted collections. std::sets are sets, but not only. For example, std::maps and sorted std::vectors are also “sets” by that definition.

So we have two sorted collections, and we’d like to match up the equivalent elements between the two.

A typical case where this is useful with maps with common keys, and we’d like to pair up the matching elements with the same keys, in order to put their values together.

Let’s take two maps (maps are sorted collections) to illustrate:

std::map<int, char> input1 = {{1,'1'}, {2,'2'}, {3,'3'}, {5,'5'}, {7,'7'}, {8, '8'}};
std::map<int, std::string> input2 = {{2,"two"}, {3,"three"}, {4,"four"}, {5,"five"}, {7,"seven"}, {11,"eleven"}};

The two maps have some keys in common: 2, 3, 5 and 7.

We’d like to obtain a collection that pairs up the corresponding elements in the map:

std::vector<std::pair<std::pair<int, char>, std::pair<int, std::string>>> results =
  { { {2,'2'}, {2,"two"}   },
    { {3,'3'}, {3,"three"} },
    { {5,'5'}, {5,"five"}  },
    { {7,'7'}, {7,"seven"} } };

Let’s design an algorithm, say set_match, to implement this.

How do we go about implementing set_match?

Also, all the algorithms on sets are in linear time. Can we keep this complexity here too?

`set_match`

There are several things to consider for the design of set_match.

Comparing the keys

With set_match, we need to determine if two elements coming from the two collections are equivalent (not equal, but equivalent). In our case, that means to have equivalent keys.

So we need to be able to compare the keys of the elements of the maps. The maps contain std::pairs of keys and values, but operator< on std::pair doesn’t compare on the key (the .first) only. It performs the comparison on both the key and the value (.first and .second). This is not what we want.

To compare on the key only, we have to define a comparison operator:

struct NumberCharStringCompare
{
    bool operator()(std::pair<int const, char> const& numberWithChar, std::pair<int const, std::string> const& numberWithString)
    {
        return numberWithChar.first < numberWithString.first;
    }
    bool operator()(std::pair<int const, std::string> const& numberWithString, std::pair<int const, char> const& numberWithChar)
    {
        return numberWithString.first < numberWithChar.first;
    }
};

Note that we’ve used the double functor trick to implement the comparison in both directions.

set_match has to accept two sets and a comparator (such as NumberCharStringCompare). To allow it to produce its output, let’s also give it an output iterator. This will allow it to be consistent with the algorithms of the STL, and it’s a good thing to respect the conventions of the STL.

Its prototype is then:

template<typename Set1, typename Set2, typename OutputIterator, typename Comparator>
OutputIterator set_match(Set1&& set1, Set2&& set2, OutputIterator out, Comparator comp)

We make it return the output iterator to be consistent with the STL on that too.

Implementing `set_match`

All the algorithms on sets have the same structure. They compare elements of the two sets together on move on this way:

if the one in the first set is smaller, move on in the first set,
if the one in the second set is smaller, move on in the second set,
if they are equivalent, move on in both sets.

This is what gives them a linear complexity.

What makes the difference between the various algorithms is the additional operations we perform in either one of those three cases before moving on.

For a detailed example of how this algorithm works in practice, you can take a look at how set_difference is implemented.

What’s specific to set_match is what it does in the case of two equivalent elements: it pairs them up and sends that pair to the output iterator.

Put another way, the algorithm for set_match is this: we compare the first elements of both sets together, then:

if the one in the first set is smaller, move on in the first set,
if the one in the second set is smaller, move on in the second set,
if they are equivalent, send the pair to the output and move on in both sets.

Here is what is looks like in code:

template<typename Set1, typename Set2, typename OutputIterator, typename Comparator>
OutputIterator set_match(Set1&& set1, Set2&& set2, OutputIterator out, Comparator comp)
{
    auto it1 = begin(set1);
    auto it2 = begin(set2);
    
    while (it1 != end(set1) && it2 != end(set2))
    {
        if (comp(*it1, *it2))
        {
            ++it1;
        }
        else if (comp(*it2, *it1))
        {
            ++it2;
        }
        else
        {
            *out = std::make_pair(*it1, *it2); // <- the specific operation
            ++out;
            ++it1;
            ++it2;
        }
    }
    return out;
}

Okay. Let’s try this out with our original inputs:

std::map<int, char> input1 = {{1,'1'}, {2,'2'}, {3,'3'}, {5,'5'}, {7,'7'}, {8, '8'}};
std::map<int, std::string> input2 = {{2,"two"}, {3,"three"}, {4,"four"}, {5,"five"}, {7,"seven"}, {11,"eleven"}};

auto results = std::vector<std::pair<std::pair<int, char>, std::pair<int, std::string>>>{};

set_match(input1, input2, back_inserter(results), NumberCharStringCompare{});

After this code has executed, results should contain the matching elements of the two maps, paired up.

To check that, let’s put some code together to print the contents of results:

void print(std::pair<std::pair<int, char>, std::pair<int, std::string>> pair)
{
    std::cout << pair.first.first << '-' << pair.first.second << '|' << pair.second.first << '-' << pair.second.second << '\n';
}

And let’s invoke it:

std::for_each(begin(results), end(results), print);

The program outputs:

2-2|2-two
3-3|3-three
5-5|5-five
7-7|7-seven

This is indeed what we expected. You can find the whole code of the program here for reference.

`set_match` in one line of code

set_match is an algorithm on sets with a quite classical implementation. Could we reuse some other code that performs this classical implementation and implement set_match with it?

It would be nice to have a generic algorithm that takes care of the general structure of the traversal and comparison of the two sets, and which we would reuse to implement other algorithms like set_match.

This is what we see in the next post, where we’ll implement set_match in one line of code. Stay tuned!