Published a few months ago
- 2 Comments

Let’s start with the following code excerpt:

1 2 3 4 5 6 7 8 9 10 11 |
std::vector< std::pair<int, std::string> > v1 = ... // v1 is filled with data std::vector< std::pair<int, std::string> > v2 = ... // v2 is filled with data std::vector< std::pair<int, std::string> > results; std::sort(v1.begin(), v1.end()); std::sort(v2.begin(), v2.end()); std::set_difference(v1.begin(), v1.end(), v2.begin(), v2.end(), std::back_inserter(result), compareFirst); |

There are 2 sets of data represented by 2 sorted vectors v1 and v2, on which we apply a `std::set_difference`

(see Algorithms on sets). This `std::set_difference`

writes its output in `results`

, with `std::back_inserter`

ensuring that all outputs are push_back’d into results.

One particularity though: a custom comparison operator is provided to `std::set_difference`

: `compareFirst`

.

By default, `std::set_difference`

compares the elements with the default comparison on `std::pair`

(which compares both the first and second element of the pair), and here with `compareFirst`

we want to compare pairs on their first element only. `compareFirst`

is not in the STL, so we will try to implement it ourselves.

Before jumping on to the implementation, we already have an interesting take away here. Even if `std::set_difference`

expect its input to be sorted, it is possible to use it (or any algorithm on sorted elements) based on a comparator (let’s call it C) different from the comparator used for sort, provided that **the elements are also sorted by this comparator** C. In our case for instance, we use a `std::set_difference`

that compares pairs by their first elements, although these pairs have been sorted by both their first and second elements. But since this implies that they are *a fortiori* sorted by first, it is completely OK to do this.

Now let’s implement `compareFirst`

. A natural, naïve, first-try code would look like:

1 2 3 4 |
bool compareFirst(const std::pair<int, std::string>& p1, const std::pair<int, std::string>& p2) { return p1.first == p2.first; // not final code, bug lurking here! } |

Actually, this implementation won’t give the expected results at all. But why?? After all, set_difference should check whether a given element is equal to another one in the other collection, right?

The least we can say is that this seems completely unnatural, and the remainder of this post will consist in understanding how we came to this, and why this is in fact completely normal.

To understand this, we must view the STL as roughly divided into 2 parts: the part that operates on SORTED elements, and the part that operates on elements that are NOT SORTED.

In this part are associative containters (`std::map`

, `std::multimap`

, `std::set`

, `std::multiset`

), because their elements are sorted.

Some algorithms also fall in this category, because they assume the elements they operate on are sorted: `std::set_difference`

, `std::includes`

or `std::binary_search`

for example.

In this part there are sequence containers (`std::vector`

, `std::list`

, `std::deque`

and `std::string`

), because their elements are not necessarily sorted.

And the algorithms that fall into this category are those that don’t need their elements to be sorted, like `std::equal`

, `std::count`

or `std::find`

for example.

There are two ways to express *“a is the same as b”* in C++:

- the natural way:
`a == b`

. This is called**equality**. Equality is based on**operator==**.

- the other way: a is not smaller than b and b is not smaller than a, so
`!(a<b) && !(b<a)`

. This is called**equivalence**. Equivalence is based on**operator<**.

Then two questions naturally arise about equivalence.

How is it different from equality?

For simple types like `int`

, and actually for most types in practice, equivalence is indeed the same thing as equality. But as pointed by Scott Meyers in Effective STL Item 19, there are some not-too-exotic types where the two are not the same, like case-insensitive strings for example.

Why such a far-fetched way to express a simple thing?

When an algorithm compares elements in a collection, it is easy to understand that there must only be **one way** of comparing them (having several comparators is cumbersome and creates a risk of inconsistency). So a choice needs to be made between comparing based on `operator==`

or on `operator<`

.

In the SORTED part of the STL, the choice is already made: by definition of sorting, the elements must be comparable with operator< (or a customized (operator<)-like function). The UNSORTED part on the other side does not have this constraint and can use the natural operator==.

The **UNSORTED** part of the STL uses **operator==** to perform comparisons, while the **SORTED** part uses **operator<**. And custom comparison operators must follow this logic.

Now we understand how to implement our custom operator `compareFirst`

for `std::set_difference`

, which operates on sorted elements:

1 2 3 4 |
bool compareFirst(const std::pair<int, std::string>& p1, const std::pair<int, std::string>& p2) { return p1.first < p2.first; // correct, STL-compatible code. } |

All of this is crucial to understand in order to use the STL efficiently.

Liked it ? Share this post ! Don't want to miss out ?