Jonathan Boccara's blog

Understanding the implementation of std::is_permutation

Published June 25, 2019 - 0 Comments

Knowing your STL algorithms is a good thing. And knowing what’s inside of them is a great way to go further in their study.

In that spirit, let’s dig into the implementation of std::is_permutation. It’s a nice algorithm to study, as it can be implemented by using other STL algorithms and it has some interesting subtleties. But nothing impossibly complicated.

As a reminder on algorithms on permutations, is_permutation takes two collections (in the form of iterators of begin and end), and returns a bool. This bool indicates whether the two collections have the same contents, but possibly not in the same order.

A naive (but wrong) implementation of is_permutation

The complexity of is_permutation, as described by the C++ standard, is O(n²), where n is the size of the first collection.

As a side note, there are ways to implement is_permutation with a better algorithmic complexity, at the expense of other parameters – check out Quentin Duval’s great analysis on the topic if you want to read more about that. But here, we focus on a standard-like implementation.

With a quadratic complexity, the first idea that comes to mind is to go over the first collection, and check for each element to see if it is part of the other one:

template<typename ForwardIterator1, typename ForwardIterator2>
bool my_is_permutation(ForwardIterator1 first1, ForwardIterator1 last1,
                    ForwardIterator2 first2, ForwardIterator2 last2)
{
    for (auto current1 = first1; current1 != last1; ++current1)
    {
        if (std::find(first2, last2, *current1) == last2)
        {
            return false;
        }
    }
    return true;
}

If we test it with two collections that are permutations of each other:

std::vector<int> v1 = {1, 2, 3, 4, 5};
std::vector<int> v2 = {3, 2, 5, 4, 1};
std::cout << my_is_permutation(begin(v1), end(v1), begin(v2), end(v2)) << '\n';

This outputs:

1

All good.

Now let’s test it with two collections that are not permutations of each other:

std::vector<int> v1 = {1, 2, 3, 4, 5};
std::vector<int> v3 = {3, 2, 6, 4, 1};
std::cout << my_is_permutation(begin(v1), end(v1), begin(v3), end(v3)) << '\n';

It now outputs:

0 

Still OK. Is it a correct implementation then?

libc++ implementation

Let’s compare it with the one from libc++, the implementation of the standard library used by clang:

template<class _BinaryPredicate, class _ForwardIterator1, class _ForwardIterator2>
_LIBCPP_CONSTEXPR_AFTER_CXX17 bool
__is_permutation(_ForwardIterator1 __first1, _ForwardIterator1 __last1,
                 _ForwardIterator2 __first2, _ForwardIterator2 __last2,
                 _BinaryPredicate __pred,
                 forward_iterator_tag, forward_iterator_tag )
{
//  shorten sequences as much as possible by lopping of any equal prefix
    for (; __first1 != __last1 && __first2 != __last2; ++__first1, (void) ++__first2)
        if (!__pred(*__first1, *__first2))
            break;
    if (__first1 == __last1)
        return __first2 == __last2;
    else if (__first2 == __last2)
        return false;

    typedef typename iterator_traits<_ForwardIterator1>::difference_type _D1;
    _D1 __l1 = _VSTD::distance(__first1, __last1);

    typedef typename iterator_traits<_ForwardIterator2>::difference_type _D2;
    _D2 __l2 = _VSTD::distance(__first2, __last2);
    if (__l1 != __l2)
        return false;

    // For each element in [f1, l1) see if there are the same number of
    //    equal elements in [f2, l2)
    for (_ForwardIterator1 __i = __first1; __i != __last1; ++__i)
    {
    //  Have we already counted the number of *__i in [f1, l1)?
        _ForwardIterator1 __match = __first1;
        for (; __match != __i; ++__match)
            if (__pred(*__match, *__i))
                break;
        if (__match == __i) {
            // Count number of *__i in [f2, l2)
            _D1 __c2 = 0;
            for (_ForwardIterator2 __j = __first2; __j != __last2; ++__j)
                if (__pred(*__i, *__j))
                    ++__c2;
            if (__c2 == 0)
                return false;
            // Count number of *__i in [__i, l1) (we can start with 1)
            _D1 __c1 = 1;
            for (_ForwardIterator1 __j = _VSTD::next(__i); __j != __last1; ++__j)
                if (__pred(*__i, *__j))
                    ++__c1;
            if (__c1 != __c2)
                return false;
        }
    }
    return true;
}

Wow. This looks a lot more elaborate than our naive attempt!

Our attempt can indeed be broken quite easily, with the following example:

std::vector<int> v1 = {1, 2, 3, 4, 5};
std::vector<int> v4 = {3, 2, 4, 4, 1};
std::cout << my_is_permutation(begin(v4), end(v4), begin(v1), end(v1)) << '\n';

Which outputs:

1

It says that they are permutations of each other, while they really aren’t.

So let’s see what should be in the implementation of is_permutation to make it correct.

Implementing a correct version of is_permutation

The problem with our previous version of is_permutation is that it doesn’t deal with the case of multiple occurences of the same value. What we want to check if each value in the first collection appears the same number of times in both collections, and that both collections have the same size.

We can change our algorithms in that sense:

template<typename ForwardIterator1, typename ForwardIterator2>
bool my_is_permutation(ForwardIterator1 first1, ForwardIterator1 last1,
                       ForwardIterator2 first2, ForwardIterator2 last2)
{
    if (std::distance(first1, last1) != std::distance(first2, last2)) return false;
    
    for (auto current1 = first1; current1 != last1; ++current1)
    {
        auto const numberOfOccurencesIn1 = std::count(first1, last1, *current1);
        auto const numberOfOccurencesIn2 = std::count(first2, last2, *current1);
        if (numberOfOccurencesIn1 != numberOfOccurencesIn2)
        {
            return false;
        }
    }
    return true;
}

The algorithm now has a guard at the beginning, to check for the size of the two passed ranges. Then it checks that each value from the first collection is represented as many times in the second one.

This version of the algorithm passes all the previous tests (which is admittedly not enough for a test suite, we’d need at least to test for empty collections, collections of different sizes, etc. but here we focus on the algorithm rather than how to constitute the test suite – which is an equally important topic though).

Our implementation is getting more elaborate, but it’s nowhere near the one of libc++! What features are missing from our implementation of is_permutation?

We’ve got the core of the algorithm right, but there are ways we can optimize it.

Discarding useless work in is_permutation

Our current version of is_permutation does way too many things. Here are a few ways to cut down some of its operations.

Similar prefix

A first thing to note is that if the two collections start with a similar sequence of elements, all there is to do is to check if their respective remainders are permutations of each other. So we can start by advancing in both collections until they start to differ.

It happens that there is an STL algorithm that does just that, and that we encountered in the predicates on ranges with the STL: std::mismatch. We can use it at the beginning of our algorithms:

template<typename ForwardIterator1, typename ForwardIterator2>
bool my_is_permutation(ForwardIterator1 first1, ForwardIterator1 last1,
                       ForwardIterator2 first2, ForwardIterator2 last2)
{
    if (std::distance(first1, last1) != std::distance(first2, last2)) return false;

    auto const [firstDifferent1, firstDifferent2] = std::mismatch(first1, last1, first2, last2);
    
    for (auto current1 = firstDifferent1; current1 != last1; ++current1)
    {
        auto const numberOfOccurencesIn1 = std::count(firstDifferent1, last1, *current1);
        auto const numberOfOccurencesIn2 = std::count(firstDifferent2, last2, *current1);
        if (numberOfOccurencesIn1 != numberOfOccurencesIn2)
        {
            return false;
        }
    }
    return true;
}

The above code uses C++17’s structured bindings, but note that C++11’s std::tie and C++98’s std::pair can achieve an equivalent (but less elegant) result.

Counting each value only once

If our current implementation, if there are several occurences (say, k occurences) of the same value in the first collection, we would count for that value k times in both collections. We can therefore make sure that we haven’t encountered this value before in the first collection:

template<typename ForwardIterator1, typename ForwardIterator2>
bool my_is_permutation(ForwardIterator1 first1, ForwardIterator1 last1,
                       ForwardIterator2 first2, ForwardIterator2 last2)
{
    if (std::distance(first1, last1) != std::distance(first2, last2)) return false;

    auto const [firstDifferent1, firstDifferent2] = std::mismatch(first1, last1, first2, last2);
    
    for (auto current1 = firstDifferent1; current1 != last1; ++current1)
    {
        if (std::find(firstDifferent1, current1, *current1) == current1)
        {
            auto const numberOfOccurencesIn1 = std::count(firstDifferent1, last1, *current1);
            auto const numberOfOccurencesIn2 = std::count(firstDifferent2, last2, *current1);
            if (numberOfOccurencesIn1 != numberOfOccurencesIn2)
            {
                return false;
            }
        }
    }
    return true;
}

Not counting a value that is not in the second collection

When we encouter a value for the first time in the first collection, we count for it in both collections. But if this value is not in the second collection, no need to count for it in the first one!

Indeed, in this case we know for sure that the two collections are not a permutation of one another. We can therefore perform that check first:

template<typename ForwardIterator1, typename ForwardIterator2>
bool my_is_permutation(ForwardIterator1 first1, ForwardIterator1 last1,
                       ForwardIterator2 first2, ForwardIterator2 last2)
{
    if (std::distance(first1, last1) != std::distance(first2, last2)) return false;

    auto const [firstDifferent1, firstDifferent2] = std::mismatch(first1, last1, first2, last2);
    
    for (auto current1 = firstDifferent1; current1 != last1; ++current1)
    {
        if (std::find(firstDifferent1, current1, *current1) == current1)
        {
            auto const numberOfOccurencesIn2 = std::count(firstDifferent2, last2, *current1);
            if (numberOfOccurencesIn2 == 0 || numberOfOccurencesIn2 != std::count(firstDifferent1, last1, *current1))
            {
                return false;
            }
        }
    }
    return true;
}

Note that this is at the expense of losing the name numberOfOccurencesIn1 because we don’t want to instantiate this value if not necessary. One way to get it back would be to explode the if statement into two consecutive if statements, but that could make the function more complex (any opinion on this?).

Not counting the beginning of the first collection

Finally, we don’t need to count from the beginning of the first collection (or rather, the point where the collections start to differ). We can instead start counting from current1, since we checked that we haven’t encountered it before.

Or even from one position after current1 (which we know is not last1 since that’s the stopping condition of the for loop):

template<typename ForwardIterator1, typename ForwardIterator2>
bool my_is_permutation(ForwardIterator1 first1, ForwardIterator1 last1,
                       ForwardIterator2 first2, ForwardIterator2 last2)
{
    if (std::distance(first1, last1) != std::distance(first2, last2)) return false;

    auto const [firstDifferent1, firstDifferent2] = std::mismatch(first1, last1, first2, last2);
    
    for (auto current1 = firstDifferent1; current1 != last1; ++current1)
    {
        if (std::find(firstDifferent1, current1, *current1) == current1)
        {
            auto const numberOfOccurencesIn2 = std::count(firstDifferent2, last2, *current1);
            if (numberOfOccurencesIn2 == 0 || numberOfOccurencesIn2 != std::count(std::next(current1), last1, *current1) + 1)
            {
                return false;
            }
        }
    }
    return true;
}

Customizing the predicate

is_permutation also has an overload that accepts a custom predicate, to compare the elements of the collections together, instead of using operator==.

In our implementation, all the comparisons are performed by other STL algorithms. We can therefore pass the predicate along to those algorithms:

template<typename ForwardIterator1, typename ForwardIterator2, typename Predicate>
bool my_is_permutation(ForwardIterator1 first1, ForwardIterator1 last1,
                       ForwardIterator2 first2, ForwardIterator2 last2,
                       Predicate pred)
{
    if (std::distance(first1, last1) != std::distance(first2, last2)) return false;

    auto const [firstDifferent1, firstDifferent2] = std::mismatch(first1, last1, first2, last2, pred);
    
    for (auto current1 = firstDifferent1; current1 != last1; ++current1)
    {
        auto equalToCurrent1 = [&pred, &current1](auto const& value){ return pred(value, *current1); };
        if (std::find_if(firstDifferent1, current1, equalToCurrent1) == current1)
        {
            auto const numberOfOccurencesIn2 = std::count_if(firstDifferent2, last2, equalToCurrent1);
            if (numberOfOccurencesIn2 == 0 || numberOfOccurencesIn2 != std::count_if(std::next(current1), last1, equalToCurrent1) + 1)
            {
                return false;
            }
        }
    }
    return true;
}

Going further

Our implementation is getting pretty close to the one in libc++, even though it seems shorter. The difference comes mainly from the fact that libc++ doesn’t use any algorithm in its implementation and performs loops instead, which take up more space in code. I’m not sure about the reason why (perhaps to skip some function calls?).

Now that we’re familiar with is_permutation‘s implementation, we are better equiped to examine a surprising requirement the standard has on this algorithm: the two collection must have the same value types.

What consequences does this requirement have? How can we work around its constraints? This is what we’ll see in the next post on std::is_permutation.

Stay tuned!

You may also like

Don't want to miss out ? Follow:   twitterlinkedinrss
Share this post!Facebooktwitterlinkedin