Jonathan Boccara's blog

Comparing C++ Containers with Lexicographical Comparison

Published December 20, 2019 - 0 Comments

Daily C++What does is mean to compare two collections of objects to determine which collection is smaller?

Even if comparison is natural for some types, comparing compound types that contain them can be trickier. For example, real numbers have a natural order (1.414 is smaller than 3.14) but complex numbers don’t have an order (1 + i is not “smaller” than 1 + 2i). This difference is reflected in C++ in that there is an operator< for double, but there isn’t one for std::complex.

However, for type std::pair, we can write the following:

auto p1 = std::pair{1, 1};
auto p2 = std::pair{1, 2};

auto const p1smaller = p1 < p2;

Even though a complex number is conceptually close to a pair, the above code compiles and p1smaller equals true in this case.

This also works for std::tuple, as well as for all STL containers, such as std::vector:

auto v1 = std::vector{1, 2, 3, 4, 5};
auto v2 = std::vector{2, 3, 4, 5, 6};

auto const v1smaller = v1 < v2;

In the above code v1smaller is also true.

Writing p1 == p2 or v1 == v2 or c1 == c2 (if c1 and c2 are std::complex numbers) also exists and has a natural meaning: the two containers have the same elements in the same order.

But v1 < v2 needs a special definition. In C++, this is lexicographical comparison.

Lexicographical comparison

Before defining lexicographical comparison, let’s review the possible options for determining which one of two vectors (or pair, or tuple, or set, etc.) is smaller.

One this that comes to mind is comparing their size. The vector with the less elements would be the “smaller” one. Even if this can make some sense regarding the English meaning of the word “smaller”, this comparison would not be practical, because a lot of vectors would then be equivalent.

To illustrate, imagine you have a collection of vectors of the same size. Using their sizes to compare them would mean that we couldn’t sort that collection (or rather that it would be sorted regardless of the order its elements). This would prevent performing a binary search on it, for instance.

Since comparing on size would not be practical, we could compare collections based on the values they contain. What if we defined that v1 is smaller than v2 iif all the elements of v1 are smaller than all the elements of v2? Or, said differently, that max(v1) is smaller than min(v2)?

This also would not be practical, because all vectors could not be compared together, for instance {1, 2, 3, 4, 5} could not be compared with {2, 3, 4, 5, 6}. An empty vector would be difficult to compare too, because it doesn’t have a minimum nor a maximum.

Another possibly would be to compare elements two by two: {1, 2, 3, 4, 5} would be smaller than {2, 3, 4, 5, 6} because 1<2 and 2<3 and 3<4 etc. But some vectors could still not be compared together, such as {1, 2, 1} and {2, 1, 2}.

Fortunately, there is a way to compare collections that is both natural and practical for programming purposes: lexicographical comparison.

Lexicographical comparison has existed since long before computers and algorithms were around; lexicographical comparison is what dictionaries use to compare words. Indeed, words can be seen a collections of letters (which is why std::string in C++ has an interface of container just like std::vector) and determining which one of two words should appear before the other one is a dictionary comes down to comparing two collections (of letters) together. As long as the values inside of two collections are comparable together, we can perform a lexicographical comparison on those collections.

Like in a dictionary, the algorithm starts by comparing the first elements of the two collections. If the first one is smaller then the collection is smaller. If the second one is smaller, then the second collection is smaller. If neither one is smaller, we perform the same check on the second elements. If we reach the end of one of the collection, then it is the smaller one.

v1 < v2 and p1 < p2 perform lexicographical comparisons. c1 < c2 could have done the same in theory, but complex numbers don’t define an order in maths.

std::lexicographical_compare

One of the STL algorithms, std::lexicographical_compare, also performs a lexicographical comparison between two collections:

auto v1 = std::vector{1, 2, 3, 4, 5};
auto v2 = std::vector{2, 3, 4, 5, 6};

auto const v1smaller = std::lexicographical_compare(begin(v1), end(v1), begin(v2), end(v2));

Or, if we wrap this algorithm in a function that takes two ranges (which you should do with your algorithms before it becomes standard in C++20):

auto v1 = std::vector{1, 2, 3, 4, 5};
auto v2 = std::vector{2, 3, 4, 5, 6};

auto const v1smaller = ranges::lexicographical_compare(v1, v2);

But then, why an algorithm if operator< already does the same thing? And what’s more, an algorithm with the second longest name in the whole STL?

std::lexicographical_compare is more powerful than operator<, in that it can do at least 3 things that operator< can’t:

1) std::lexicographical_compare can compare vectors that contains different types of values.

The following code does not compiles:

auto v1 = std::vector<int>{1, 2, 3, 4, 5};
auto v2 = std::vector<double>{2, 3, 4, 5, 6};

auto const v1smaller = v1 < v2;

because v1 and v2 are not of the same type, despite the fact than ints can be compared with doubles.

But using std::lexicographical_compare makes it compile:

auto v1 = std::vector{1, 2, 3, 4, 5};
auto v2 = std::vector<double>{2, 3, 4, 5, 6};

auto const v1smaller = ranges::lexicographical_compare(v1, v2);

2) std::lexicographical_compare can compare containers of different types.

The following code comparing a vector with a set does not compile:

auto v1 = std::vector<int>{1, 2, 3, 4, 5};
auto s2 = std::set<int>{2, 3, 4, 5, 6};

auto const v1smaller = v1 < s2;

But this one does:

auto v1 = std::vector<int>{1, 2, 3, 4, 5};
auto s2 = std::set<int>{2, 3, 4, 5, 6};

auto const v1smaller = ranges::lexicographical_compare(v1, s2);

And finally:

3) std::lexicographical_compare allows custom comparators.

If you use a collection a pairs representing keys and values for example, you may want to perform comparison based on keys only:

auto v1 = std::vector<std::pair<int, std::string>>{{1, "one"}, {2, "two"}, {3, "three"}};
auto v2 = std::vector<std::pair<int, std::string>>{{2, "two"}, {3, "three"}, {4, "four"}};

auto const v1smaller = std::lexicographical_compare(begin(v1), end(v1),
                                                    begin(v2), end(v2),
                                                    [](auto const& p1, auto const& p2){ return p1.first < p2.first;});

And operator< doesn’t allow such custom comparison operators.

As an example of using these three features together, we could use std::lexicographical_compare to compare a std::vector<std::pair<int, std::string>> with a std::map<double, std::string> by comparing keys together:

auto v1 = std::vector<std::pair<int, std::string>>{{1, "one"}, {2, "two"}, {3, "three"}};
auto m2 = std::map<double, std::string>{{2, "two"}, {3, "three"}, {4, "four"}};

auto const v1smaller = std::lexicographical_compare(begin(v1), end(v1),
                                                    begin(m2), end(m2),
                                                    [](auto const& p1, auto const& p2){ return p1.first < p2.first;});

Is v1 < v2 that natural?

If you don’t need the extra features brought by std::lexicographical_compare, the simplest way of comparing STL containers is to use operator<. And for comparing pairs and tuple, you have to use operator< anyway because STL algorithms don’t operate on them.

But do you find the expression v1 < v2 natural? Would you interpret this as a lexicographical comparison when you read code, or would you prefer it spelled out explicitly by using std::lexicographical_compare even in the simple cases? Let me know your opinion by leaving a comment below.

Don't want to miss out ? Follow:   twitterlinkedinrss
Share this post!Facebooktwitterlinkedin