Jonathan Boccara's blog

Searching when you have access to an STL container

Published January 26, 2017 - 2 Comments

Daily C++

After seeing how to search for values in a range delimited by iterators, let’s see how to operate efficiently when you directly have access to a C++ container.

This is the second one in the series about searching with the STL:

  • Searching in an STL container: how to perform efficient and correct searches when you directly have access to an STL container, as opposed to a simple range,

The main thing to know about searching in STL containers is this: when possible it is preferable to use the container  methods instead of algorithms.

This comes from for 3 reasons:

  • it is faster: in sorted containers, all methods benefit from the rapid logarithmic search in a sorted collection. Also, std::string methods implement optimal algorithms and benefit from the internal representation of the string,
  • it is more natural:
    • std::map and std::multimap methods can directly search for a key, and not a std::pair<Key, Value>, that an algorithm would have to look for because that is what their iterators point to,
    • std::string offers string-specific searching operations like substring search,
  • it is more correct in some cases: in sorted containers (like maps and sets), all methods use equivalence and not equality, which is not the case of some algorithms (like std::count and std::find that use equality).

Now let’s get into more details, by examining how this applies to the various containers the STL offers.

std::vector, std::deque, std::list

These containers do not expose any method related to searching. Only algorithms can be used on them.

std::map, std::multimap, std::set, std::multiset

These containers have 5 class methods that share their names with some algorithms: count, find, equal_range, lower_bound and upper_bound. See all about these algorithms in the first post of the series.

These methods offer several of the 3 advantages explained above:

Container method More correct than algo? Faster than algo? More natural than algo?
count search container C++ search container C++ search container C++
find search container C++ search container C++ search container C++
equal_range just as correct search container C++ search container C++
lower_bound just as correct search container C++ search container C++
upper_bound just as correct search container C++ search container C++

 

  • Better correctness comes from the usage of equivalence instead of equality,
  • Better performance comes from the fact that elements are sorted for sequence containers. And for associative containers, it come from the fact that their iterators are not random-access, so the algorithms cannot perform a bisect by directly jumping the desired elements (they have to start from the start and move up to their position), while the containers don’t have this constraint with their internal representation. Thanks sibecker for pointing this out.
  • They are more natural for maps because the argument passed to the various methods is a key, and not a std::pair<Key, Value>.

Note that there is no container method equivalent to std::binary_search. To check for the existence of a key in a container:

  • for std::map and std::set:
    • compare the result of find with the end iterator, or
    • use the count method: as a method, count does not incur any performance issue, because, like find, it stops at the first key equivalent to the searched one (since there can only be one key equivalent to the searched one, by definition of std::map and std::set)
  • for std::multimap and std::multiset: since count does not stop at the first key equivalent to the searched one, find has an advantage over count here.

Note that in an std::multimap or std::multiset, the find method returns any element equivalent to the searched value, and not necessarily the first one. If you do need the first one, use equal_range for its simple interface, or, in the case where your profiler says that equal_range is too slow because it shows you the whole range while you only need the first element, then you can use lower_bound.
But you’ll have to pay for all its drawbacks that we saw in the topic of searching a range.

std::string

std::string actually has 24 searching methods (!).

They are divided into 6 groups, and each group has 4 overloads.

For all groups the 4 overloads are of the form:

  • search for a string given by a std::string,
  • search for a string given by a char* and a size,
  • search for a string given by a char* (stops at the null character),
  • search for a char.

And all 4 take a start position in the searched string, as a parameter with a default value of 0 (start searching from the beginning of the string).

Here are the 6 groups of methods:

  • find: searches for the first ocurence of the desired string (or char) as a substring,
  • rfind: searches for the last ocurence of the desired string (or char) as a substring,
  • find_first_of: search for the first occurence of any of the char in the desired string (or char),
  • find_last_of: search for the last occurence of any of the char in the desired string (or char),
  • find_first_not_of: search for the first occurence of any char which is not in the desired string (or char),
  • find_last_not_of: search for the last occurence of any char which is not in the desired string (or char).

String algorithms are not trivial to implement in linear time. std::string methods implement them optimally, which you can benefit from when searching for something in a string.

This is it for searching directly in an STL container. The last episode of this series will show you STL searching algorithms that few people know. In the next post though, we’ll take a short break from the searching and from the STL, to focus about a central topic in code expressiveness: how to give good names in your code.

Related articles:

Don't want to miss out ? Follow:   twitterlinkedinrss
Share this post!Facebooktwitterlinkedin

Comments are closed