Jonathan Boccara's blog

The searching <algorithm>s the STL holds secret

Published February 2, 2017 - 5 Comments

Daily C++

Let’s wrap up the series about searching with the STL by reviewing a handful of algorithms that are much less known than those presented in the other posts, but can prove themselves quite useful.

The searching algorithms the STL holds secret

Here is the series about searching with the STL:

  • Searching in an STL container: how to perform efficient and correct searches when you directly have access to an STL container, as opposed to a simple range,
  • The searching <algorithm>s the STL holds secret: exploring algorithms that were unknown to the vast majority of developers I’ve presented this to, but that were deemed useful by those who did learn them.

All the following don’t assume that the elements they operate on are sorted, so they perform comparisons with operator== (or a custom comparator you can provide).

std::find_first_of

This algorithm has a similar behaviour as its counterpart in the std::string class see in Searching in an STL container, but is not restricted to chars and strings:

template <typename ForwardIterator1, typename ForwardIterator2>
ForwardIterator1 find_first_of(ForwardIterator1 first, ForwardIterator1 last,
                               ForwardIterator2 s_first, ForwardIterator2 s_last);

Here std::find_first_of searches the range [first, last[ for the first occurence of any of the element in range [s_first, s_last[.

Note that the 3 other find_*_of methods from std::string don’t have a counterpart in algorithms.

std::adjacent_find

std::adjacent_find searches a range for 2 consecutive identical elements, and returns an iterator on the first element of these two.
If no consecutive elements are found, it returns the end iterator of the range.

Even though it is not very widely known, std::adjacent_find has interesting real-life usages.
Consider the following case: we have a collection in which we want to merge consecutive identical elements together.
Here is a simple way to implement the algorithm using std::adjacent_find:

template <typename ForwardIterator, typename OutputIterator>
OutputIterator mergeAdjacent(ForwardIterator first, ForwardIterator last, OutputIterator results)
{
    ForwardIterator beginUnique = first;
    while (beginUnique != last)
    {     
      // output all unique elements; stop when finding indentical adjacent ones
      ForwardIterator endUnique = std::adjacent_find(beginUnique, last);
      results = std::copy(beginUnique, endUnique, results);
    
      // identify the range of identical adjacent elements
      ForwardIterator beginIdentical = endUnique;
      ForwardIterator endIdentical = std::find_if(beginIdentical, last, [beginIdentical](const auto& element) {return element != *beginIdentical;});
    
      // aggregate identical flows into one
      if (beginIdentical != endIdentical)
      {
         *results = std::accumulate(beginIdentical, endIdentical, typename ForwardIterator::value_type());
         ++results;
      }
      beginUnique = endIdentical;
    }
    return results;
}

Here is how this function operates:

It finds the first occurrence of several identical elements with std::adjacent_find:

ForwardIterator endUnique = std::adjacent_find(beginUnique, last);

All elements before this point are different from their immediate neighbours, so we want to keep them in the output:

std::copy(beginUnique, endUnique, results);

Then it works out until what point the consecutive elements are identical:

ForwardIterator endIdentical = std::find_if(beginIdentical, last, [beginIdentical](const auto& element) {return element != *beginIdentical;});

These identical elements are added up together (which can be customized if you want to do something different from just adding for merging elements):

*results = std::accumulate(beginIdentical, endIdentical, typename ForwardIterator::value_type());

And repeat.

Here is a usage example:

vector<int> v = { 1, 4, 5, 5, 3, 42, 7, 7, 7, 7, 3, 9 };
vector<int> results;
mergeAdjacent(v.begin(), v.end(), back_inserter(results));
// results now contains: 1 4 10 3 42 28 3 9  

Note how the core part of this function was the call to std::adjacent_find.

std::search et al.

Ever wondered why std::find was called this, even though it may not find anything ? Wouldn’t std::search be a more suitable name ?

In my opinion, std::find is called this way because std::search already exists, and does something else. Did you know std::search?

std::search

In essence, std::search is very similar to a substring search inside a string. But it is not limited to chars and string, it can search for the first occurence of a subrange within a range of any type.
Here is its prototype:

template <typename ForwardIterator1, typename ForwardIterator1>
ForwardIterator1 search(ForwardIterator1 first, ForwardIterator1 last,
                        ForwardIterator2 s_first, ForwardIterator2 s_last);

But contrary to std::string methods, std::search does not have to operate in linear time (see the section on Boost further down for how to make sure your search operates in linear time).

std::search has 2 siblings among the algorithms family: std::search_n and std::find_end.

std::search_n

 std::search_n searches for a subrange constitued of n times the same value. Here is its prototype:

template <typename ForwardIterator, typename Size, class T>
ForwardIterator search_n(ForwardIterator first, ForwardIterator last, Size n, const T& value);

std::find_end

Somewhat curiously named, std::find_end searches for the last occurence of a subrange in a range (where std::search searched for the first occurence):

template <typename ForwardIterator1, typename ForwardIterator2>
ForwardIterator1 find_end(ForwardIterator1 first, ForwardIterator1 last,
                          ForwardIterator2 s_first, ForwardIterator2 s_last);

Searching algorithms in Boost

Contrary to std::string methods, std::search does not have to operate in linear time. It is allowed to make s * n comparisons, where n is the length of the range and s the length of the searched subrange.

There are some cleverer algorithms that do operate in linear time, and some of them are available in Boost. Their way of working is outside the scope of this post, but you can find three of them in boost:

  • the Boyer Moore algorithm (under the <boost/algorithm/searching/boyer_moore.hpp> header)
  • the Boyer Moore Horspool algorithm (under the <boost/algorithm/searching/boyer_moore_horspool.hpp> header)
  • the Knuth Morris Pratt algorithm (under the <boost/algorithm/searching/knuth_morris_pratt.hpp> header)

Note that even though they operate in linear time, you are not guaranteed to have better performance in your particular case, for 2 reasons:

  • they bear some overhead, so for short strings in particular they might actually be slower than std::search,
  • there are several types of string and several type of patterns out there (speech, source code, DNA, etc) and some algorithms are more or less performant depending on the type of string they work on.

Moreover, for searching algorithms, the STL is more flexible than Boost, because it lets you search a last occurence (std::find_end), and lets you customize the comparison operator, both of which you can’t do with boost.

So in general do use STL searching algorithms, unless you are sure that Boost’s ones are more performant in your particular case.

And that’s it for searching with the STL (and slightly beyond).

Don't want to miss out ? Follow:   twitterlinkedinrss
Share this post!Facebooktwitterlinkedin

Comments are closed