Jonathan Boccara's blog

Custom comparison, equality and equivalence with the STL

Published February 16, 2017 - 2 Comments

Daily C++

Let’s start with the following code excerpt: 

std::vector< std::pair<int, std::string> > v1 = ... // v1 is filled with data
std::vector< std::pair<int, std::string> > v2 = ... // v2 is filled with data
std::vector< std::pair<int, std::string> > results;
  
std::sort(v1.begin(), v1.end());
std::sort(v2.begin(), v2.end());
  
std::set_difference(v1.begin(), v1.end(),
                    v2.begin(), v2.end(),
                    std::back_inserter(result),
                    compareFirst);

There are 2 sets of data represented by 2 sorted vectors v1 and v2, on which we apply a std::set_difference (see Algorithms on sets). This std::set_difference writes its output in results, with std::back_inserter ensuring that all outputs are push_back’d into results.

One particularity though: a custom comparison operator is provided to std::set_difference: compareFirst.

By default, std::set_difference compares the elements with the default comparison on std::pair (which compares both the first and second element of the pair), and here with compareFirst we want to compare pairs on their first element only. compareFirst is not in the STL, so we will try to implement it ourselves.

Before jumping on to the implementation, we already have an interesting take away here. Even if std::set_difference expect its input to be sorted,  it is possible to use it (or any algorithm on sorted elements) based on a comparator (let’s call it C) different from the comparator used for sort, provided that the elements are also sorted by this comparator C. In our case for instance, we use a std::set_difference that compares pairs by their first elements, although these pairs have been sorted by both their first and second elements. But since this implies that they are a fortiori sorted by first, it is completely OK to do this.

Now let’s implement compareFirst. A natural, naïve, first-try code would look like:

bool compareFirst(const std::pair<int, std::string>& p1, const std::pair<int, std::string>& p2)
{
    return p1.first == p2.first; // not final code, bug lurking here!
}

Actually, this implementation won’t give the expected results at all. But why?? After all, set_difference should check whether a given element is equal to another one in the other collection, right?

The least we can say is that this seems completely unnatural, and the remainder of this post will consist in understanding how we came to this, and why this is in fact completely normal.

To understand this, we must view the STL as roughly divided into 2 parts: the part that operates on SORTED elements, and the part that operates on elements that are NOT SORTED.

The SORTED part of the STL

In this part are associative containters (std::map, std::multimap, std::set, std::multiset), because their elements are sorted.

Some algorithms also fall in this category, because they assume the elements they operate on are sorted: std::set_difference, std::includes or std::binary_search for example.

The UNSORTED part of the STL

In this part there are sequence containers (std::vector, std::list, std::deque and std::string), because their elements are not necessarily sorted.

And the algorithms that fall into this category are those that don’t need their elements to be sorted, like std::equal, std::count or std::find for example.

Comparing elements

There are two ways to express “a is the same as b” in C++:

  • the natural way: a == b. This is called equality. Equality is based on operator==.
  • the other way: a is not smaller than b and b is not smaller than a, so !(a<b) && !(b<a). This is called equivalence. Equivalence is based on operator<.

Then two questions naturally arise about equivalence.

How is it different from equality?

For simple types like int, and actually for most types in practice, equivalence is indeed the same thing as equality. But as pointed by Scott Meyers in Effective STL Item 19, there are some not-too-exotic types where the two are not the same, like case-insensitive strings for example.

Why such a far-fetched way to express a simple thing?

When an algorithm compares elements in a collection, it is easy to understand that there must only be one way of comparing them (having several comparators is cumbersome and creates a risk of inconsistency). So a choice needs to be made between comparing based on operator== or on operator<.

In the SORTED part of the STL, the choice is already made: by definition of sorting, the elements must be comparable with operator< (or a customized (operator<)-like function). The UNSORTED part on the other side does not have this constraint and can use the natural operator==.

Implementing the comparator

The UNSORTED part of the STL uses operator== to perform comparisons, while the SORTED part uses operator<. And custom comparison operators must follow this logic.

Now we understand how to implement our custom operator compareFirst for std::set_difference, which operates on sorted elements:

bool compareFirst(const std::pair<int, std::string>& p1, const std::pair<int, std::string>& p2)
{
    return p1.first < p2.first; // correct, STL-compatible code.
}

All of this is crucial to understand in order to use the STL efficiently.

Don't want to miss out ? Follow:   twitterlinkedinrss
Share this post!Facebooktwitterlinkedin

Comments are closed