Jonathan Boccara's blog

Predicates on ranges with the STL

Published February 23, 2017 - 6 Comments

Daily C++

In this episode of the STL learning resource, we see algorithms that can be used in a variety of contexts but that have one thing in common: they return a boolean characteristic of one or several ranges.

The *_of series

The STL provides 3 algorithms that indicates whether all, some, or none of the elements of a range satisfy a given condition. The condition is itself expressed by a predicate, that is to say a function pointer (or object) that take an element of the range and returns a bool.

These 3 algorithms are:

  • std::all_of: checks whether all of the elements in the range satisfy the given condition. It returns true if the range is empty, so its semantic is more precisely checking if no element does not satisfy the condition.
  • std::any_of: checks whether any one of the elements in the range satisfies the given condition. It returns false if the range is empty.
  • std::none_of: checks whether no element in the range satisfy the given condition. It returns true if the range is empty.

This is it for the STL, but Boost goes a bit further and proposes the following algorithm:

  • boost::algorithm::one_of: checks whether exactly one element in the range satisfies the given condition. It (quite expectably) returns false if the range is empty.

Boost also provides “*_equal” versions of each of the above algorithms, that accept a range and a value, and have the same behaviour as their native counterpart, with the condition being that the element is equal to the passed value. The comparison is done with operator== and cannot be customised.

And in the case of an empty range, they behave the same way as their native counterparts.

std::equal

std::equal can be used to compare 2 ranges, checking if elements are respectively equal (comparison is done with operator== or with a custom comparator). Note that std::equal takes a 1.5-Range, meaning that the first range is indicated by a begin and an end iterator, while the second range misses the end iterator:

template<template InputIterator1, template InputIterator2 >
bool equal(InputIterator1 first1, InputIterator1 last1,
           InputIterator2 first2);

So the algorithm goes on until the end of the 1st range and performs comparisons with the 2nd range even if it shorter, because it just doesn’t know how long the second range is.

For std::equal, this is both unnatural and dangerous:

  • this is unnatural, because if the 1st range has, say, N elements, std::equal returns true as long as the first N elements of the 2nd range are equal to the N elements of the 1st range, and even if the 2nd range has more elements than the 1st range.
  • this is dangerous, because if the 2nd range is shorter than the 1st range, the algorithm will go past its end, leading to undefined behaviour.

Starting from C++14 this is corrected, with new overloads of std::equal taking 2 complete ranges with both begins and ends.

Checking for permutations

Say that we have two collections. How do you determine if one is a permutation of the other? Or, said differently, if one contains the same elements as the other, even if in a different order?

To do that, the STL offers std::is_permutation.

For instance, given the following collections:

std::vector<int> v1 = {1, 2, 3, 4, 5};
std::vector<int> v2 = {4, 2, 3, 1, 5};
std::vector<int> v3 = {2, 3, 4, 5, 6};

Calling std::is_permutation this way:

std::is_permutation(v1.begin(), v1.end(),
                    v2.begin(), v2.end());

returns true, while

std::is_permutation(v1.begin(), v1.end(),
                    v3.begin(), v3.end());

returns false, because the elements of v3 are different from those of v1.

Before C++14, std::is_permutation had an 1.5-Range interface, that is to say that it accepted a begin and end for the first range, and only a begin iterator for the second one:

std::is_permutation(v1.begin(), v1.end(),
                    v2.begin());

So if the second collection was smaller than the first one, the algorithm would happily query it past its end until it gets to the end of the first one, thus causing underfined behaviour. The consequence was that you must be sure that the second collection was at least as big as the first one.

But this has been corrected in C++14, that adds the overload taking a begin and an end iterator for both collections.

std::is_permutation compares elements with operator==, and provides an overload that accepts custom comparators.

The algorithmic complexity of std::is_permutation

std::is_permutation has a complexity of “at most O(n²)”.

That can sound surprising: indeed, the algorithms of the STL are known to be implemented with the best possible algorithmic complexity. And it seems like we could do better than quadratic complexity, can’t we?

It turns out we can, but at the expense of extra memory allocation, and if you’re interested to read more about that I suggest you take a look at Quentin’s article Lost in Permutation Complexity. So it’s a trade-off between CPU and memory. Sounds familiar, doesn’t it?

A use case for std::is_permutation

Consider a function that returns a collection of values (or produces it via an output iterator), but does not specify in which order those elements are positioned inside the collection.

How would you write a unit test for this function?

You can’t test an EXPECT_EQ between the expected output and the actual one, since we don’t know what the output should be equal to exactly, since we don’t know the order of its elements.

Instead, you can use std::is_permutation:

std::vector<int> expected = {1, 2, 3, 4, 5};

std::vector<int> results = f();

EXPECT_TRUE(std::is_permutation(begin(expected), end(expected),
                                begin(results), end(results)));

This way you can express that you expect the function f to return 1, 2, 3, 4 and 5, but in any order.

std::mismatch and std::lexicographical_compare

These 2 algorithms let you implement some sort of ordering on ranges, which you can use to compare 2 ranges.

More specifically:

std::mismatch compares respective elements of its 2 input ranges starting from their beginning, and returns the first place where they differ, in the form of an std::pair of iterators: the first element of the pair is an iterator to the first mismatching element in the 1st range, and second element of the pair is an iterator to the first mismatching element in the 2nd range.

It performs comparisons with operator== (or a custom comparator).

template<typename InputIt1, typename InputIt2, typename BinaryPredicate>
std::pair<InputIt1,InputIt2>
    mismatch(InputIt1 first1, InputIt1 last1,
             InputIt2 first2,
             BinaryPredicate p);

Note that std::mismatch also suffers from the 1.5-Range problem, so make sure you pass the shorter range first. This can be cumbersome if you do use it to make comparisons. But just as for std::equal, the 1.5-Range problem is solved for std::mismatch starting from C++14.

std::lexicographical_compare actually provides an order on ranges, and operates the same way as a dictionary would provide an order on strings, hence its name. It compares elements two by two with operator< (or a custom comparator).

template<typename InputIt1, typename InputIt2, typename Compare>
bool lexicographical_compare(InputIt1 first1, InputIt1 last1,
                             InputIt2 first2, InputIt2 last2,
                             Compare comp );

std::lexicographical_compare takes 2 full ranges so it does not have the 1.5-Range problem.

std::lexicographical_compare can be quite handy for allowing a natural and easy to code order on classes wrapping a container. For instance, say when treating CSV-like data that we design an Entry class that represents all the pieces of data separated by commas on a given line in the CSV file:

class Entry
{
public:
    // ...Entry interface...
    bool operator<(const Entry& other)
    {
        return std::lexicographical_compare(begin(data_), end(data_),
                                            begin(other.data_), end(other.data_));
    }
private:
    std::vector<std::string> data_;
};

This allows for an easy sort of entries in a natural way, which gives access to rapid searching and related functionalities (insertion, and so on). It also makes Entry compatible with sorted associative containers like std::map, std::set et alii.

Don't want to miss out ? Follow:   twitterlinkedinrss
Share this post!Facebooktwitterlinkedin

Comments are closed