Know your algorithms: algos on sets

Published January 9, 2017 - 6 Comments

This post is part of the STL learning resource. To get the bigger picture of the STL topics that I intend to cover on Fluent C++, you can go have a look at the dedicated page at fluentcpp.com/STL.

The purpose of this series of posts is to give you an opportunity to accumulate — no pun intended! — knowledge on the STL one bit at a time, this time focusing on algorithms on sets.

Here the word “set” is taken in the general sense of a collection of elements, and not only std::set, provided that the range is sorted. Indeed all the algorithms mentioned in this post require their input ranges to be sorted. Similarly their output ranges — when they produce one — are sorted too.

Taking parts of 2 sets

The STL features 4 complementary algorithms that can take various parts of 2 given sets. They have a common form of prototype taking two ranges in input and putting their results in one range in output:

template<typename InputIterator1, typename InputIterator2, typename OutputIterator>
OutputIterator algo(InputIterator1 first1, InputIterator1 last1,
                    InputIterator2 first2, InputIterator2 last2,
                    OutputIterator result);

So for two sorted collections A and B, the invocation of one such algorithm would typically look like:

algo(A.begin(), A.end(), B.begin(), B.end(), result);

result can typically be an std::back_inserter over a vector as seen in this post, or any other output iterator.

For the examples, let’s consider 2 sets A and B.

`std::set_difference`

std::set_difference copies into result all the elements that are in A, but not in B.

set_difference

Here is what code using std::set_difference looks like:

#include <algorithm>
#include <iterator>
#include <set>
#include <vector>

std::vector<int> A = ... // sorted vector
std::set<int> B = ... // std::set is always sorted

std::vector<int> results;

std::set_difference(A.begin(), A.end(),
                    B.begin(), B.end(),
                    std::back_inserter(results));

`std::set_intersection`

std::set_intersection copies into result all the elements from A that are also in B.

set_intersection

`std::set_union`

std::set_union copies into result all the elements that are in A, in B, or in both. For those that are in both, the A versions will be taken (unless there are more occurrences of a common element in B than in A, in which case its additional versions in B are also taken).

set_union

`std::set_symmetric_difference`

Behind its funny name, set_symmetric_difference simply copies into result the elements that are in A but not in B, and those that are in B but not in A.

set_symmetric_difference

set_symmetric_difference is a particularly good example of algorithm that sounds like complicated when you come across it in a reference site listing all algorithms. But you can see that it is in reality very simple to understand and can be useful in day-to-day coding. This happens fairly often with STL algorithms.

Comparing 2 sets

We will see more algorithms comparing two collections in the post dedicated to predicates on collections, but here I want to specifically mention std::includes, because it operates on sets, that are collections of elements in sorted order as explained earlier.

Given 2 sorted collections A and B, std::includes checks whether all the elements of B are also in A.

includes

Its prototype is:

template<typename InputIterator1, typename InputIterator2>
bool std::includes(InputIterator1 first1, InputIterator1 last1,
                   InputIterator2 first2, InputIterator2 last2 );

and it is typically used the following way:

bool AincludesB = std::includes(A.begin(), A.end(), B.begin(), B.end());

Merging 2 sets

`std::merge`

std::merge is used to merge two sorted collections into one sorted collection. Its prototype is:

template<typename InputIterator1, typename InputIterator2, typename OutputIterator>
OutputIterator merge(InputIterator1 first1, InputIterator1 last1,
                     InputIterator2 first2, InputIterator2 last2,
                     OutputIterator result);

and given 2 sorted collections A and B, merging A and B into a sorted range starting at result is typically done the following way:

std::merge(A.begin(), A.end(), B.begin(), B.end(), result);

`std::inplace_merge`

Let’s state it clearly: while all the algorithms of this posts are frequently useful in code, std::inplace_merge is very seldom used. I want to describe it for the sake of comprehensiveness since one of the objectives of the STL learning resource is to cover ALL algorithms, but if you’re not curious about the algorithms and merely interested in practical consequences on your code, you can just skip over to the next section.

Ok, so if you’re still here let’s dig into std::inplace_merge. This algorithm takes one collection and does a merge directly inside it. As a comparison, std::merge took two collections and output its results in a third one. std::inplace_merge regards the collection it operates on as two consecutive parts, and merges the first part with the second one.

More precisely, its prototype is

template<typename BidirectionalIterator>
void inplace_merge(BidirectionalIterator first, BidirectionalIterator middle, BidirectionalIterator last );

where:

first is the beginning of the collection, which is also the beginning of the first part,
middle is the beginning of the second part of the collection,
last is the end of the collection, which is also the end of its second part.

People I show this to often ask the following question: how does std::inplace_merge differ from std::sort ? And the answer lies in the difference in the pre-condition: in std::inplace_merge, the first part and the second part are already themselves sorted, because std::inplace_merge is an algorithm on sets.(there is also a rather technical difference, which is that std::sort requires random access iterators while std::inplace_merge only requires bidirectional iterators).

Where can std::inplace_merge be useful ? Typically in the implementation of a merge sort algorithm.

Why sorted?

All the algorithms seen in this post require that their input and output ranges are sorted. This is important to remember for two reasons:

if you pass input ranges that are not sorted to any of these algorithms, results will be wrong. Indeed these algorithms take assumptions based on the fact that input ranges are sorted. If this is not true then these assumptions become false.

these assumptions let the algorithms perform their job more quickly: typically in a O(n) complexity instead of a O(N*logN) that would have been incurred on unsorted ranges.

Conclusion

We saw all the algorithms the STL offers to operate on sets, that are collections of sorted elements, in the general sense.

How do all these algorithms compare the elements they manipulate, in order to check what to do with them ? It is crucial to understand this when using these algorithms, and it will be the subject of a dedicated post (scheduled for Jan 31).

Now I want to ask you: what did you think of this post? Was it useful to you? Please, share your feedback and let me know if this kind of presentation of STL algorithms is helpful for you. This will help me shape future posts in order to bring you the most value that I can.