Jonathan Boccara's blog

The BooSTL Algorithms: Boost Algorithms That Extend the STL (1/3)

Published March 29, 2019 - 0 Comments

Daily C++The STL features a proud 105 algorithms, but that is by no means all the algorithms there is in C++.

There are many ways to extend the STL. One of them is to include the STL-like algorithms that are in Boost, which I like to call the BooSTL algorithms!

Just like the STL algorithms, those functions are assets for every C++ developer. They are useful to know, and their implementations are instructive.

The algorithms in Boost fall into two categories: the algorithms that don’t exist in the STL, and the algorithms that are added in the STL in some version of C++ (for instance, if you don’t have C++17 but have a recent version of Boost, you’ll get in it the algorithms that are added in C++17, such as exclusive_scan for example).

Here we’re going to focus on the algorithms that are not in any version of the STL (at least as of this writing). Most of them are located in the Boost.Algorithm library authored by Marshall Clow, but some of them are scattered across other libraries in Boost, such as Boost.Sort for example. If you see some algorithms of Boost that I have not included, do let me know and I’ll add them.

For each of the presented algorithms, you will find a link to its source code or the whole implementation itself when it is not too big. It can be useful if you don’t use Boost and want to see how they are implemented, or even if you’re just curious about them. Indeed, they are generally elegant and are a good source of inspiration for writing good code.

There is a lot of contents in the BooSTL, and all of it is good to know in my opinion. So to make it easier to digest, we’re going to chunk this up into 3 articles:

  • the BooSTL algorithms on sorting and partitioning,
  • the BooSTL algorithms on searching,
  • the other BooSTL algorithms.

Let’s get started with the Boost algorithms on sorting and partitioning that extend the STL then! boost algorithms

The BooSTL algorithms of sorting

spreadsort

In the world of sorting, there are at least two approaches: comparison-sort and radix-sort.

Comparison-sort consists in comparing together the values inside a collection with operator< (or a custom equivalent) and depending on the result, rearranging the order of the elements until getting to a sorted collection.

This approach includes quick sort, merge sort, bubble sort and all the classics that we learn at school. Radix-sort is a different approach, as it does not compare values with operator<. Instead, it positions at the beginning of the collection all the values with a most significant digit (MSD) of 0, and at the end of the collection of those with a MSD of 1.

Said differently, it starts by partitioning the collection with the predicate “MSD == 0”; After doing this, it partitions both halves of the collection with the predicate “second MSD == 0”. And so on, recursively, until getting to buckets of size 1 and the whole collection being sorted.

Strictly speaking, this operation only applies to binary numbers. But there are special implementations of radix-sort adapted to other types, such as primitive types or std::string.

Radix-sort seems to be faster than comparison-sort on large collections, and slower on small collections. Spreadsort is a mix of the two: it starts with a radix-sort until getting buckets smaller than a certain threshold, and then finishes off the operation by performing comparison-sorts on the buckets. At least in theory, spreadsort should be as fast or faster than radix-sort and comparison-sort. Boost offer boost::sort::spreadsort::spreadsort on primitive types and std::strings in the header boost/sort/spreadsort/spreadsort.hpp.

#include <vector>
#include <boost/sort/spreadsort/spreadsort.hpp>

int main()
{
    std::vector<double> numbers = {1.1, 5.5, -2.2, 3.3, -7.7};
    
    boost::sort::spreadsort::spreadsort(begin(numbers), end(numbers));
}

sort_subrange

As its name suggests, boost::sort_subrange, rearranges the elements of the collection so that those in a certain subrange are at the positions they would be in if the whole range had been sorted. This algorithm takes 4 iterators: two to indicate the whole range, and two to indicate the subrange inside of the whole range:

#include <iostream>
#include <vector>
#include <boost/algorithm/sort_subrange.hpp>

int main()
{
    std::vector<int> numbers = { 4, 1, 7, 8, 0, 5, 2, 10, 6, 9, 3 };
    
    boost::algorithm::sort_subrange(begin(numbers), end(numbers),
                                    begin(numbers) + 3, begin(numbers) + 6);
    
    for (auto const& number : numbers) std::cout << number << ' ';
}

The above code outputs:

1 0 2 3 4 5 6 7 8

The sorted subrange contains 2 3 4 5 6. The version of sort_subrange in the above example uses operator< to compare the elements of the collection. sort_subrange also has an overload that accepts another comparison predicate if you want to compare on something else than operator<. Since the implementation of this algorithm is elegant and short, let’s have a look at it:

template<typename Iterator, typename Pred> 
void sort_subrange (
  Iterator first,     Iterator last, 
  Iterator sub_first, Iterator sub_last,
  Pred p)
{
  if (sub_first == sub_last) return; // the empty sub-range is already sorted.
  
  if (sub_first != first) { // sub-range is at the start, don't need to partition
  	(void) std::nth_element(first, sub_first, last, p);
  	++sub_first;
  	}
  std::partial_sort(sub_first, sub_last, last, p);
}

As you can see it is implemented with the STL algorithms partial_sort and nth_element. sort_subrange uses nth_element to do two things:

  • put the right value at the first position of the subrange,
  • partition the collection so that the elements that are not lower than that value are on its right.

Then it uses partial_sort to sort the beginning of the collection starting at the second position of the subrange (no need to sort the first position since nth_element put the right value there already). This collection is sorted until the end of the subrange. sort_subrange Not familiar with those STL algorithms on sorting? Have a look at Sorting with the STL! The source code of boost::algorithm::sub_range is available in boost/algorithm/sort_subrange.hpp. Now sort_subrange has a cousin living in…

The BooSTL algorithms of partitioning

 partition_subrange

partition_subrange is a relative of sort_subrange because it also performs an operation a subpart of a collection. But this operation is partitioning.

Partitioning consists in reordering the elements of a collection according to a predicate: the elements that satisfy the predicate are put at the beginning of the collection, and those that don’t satisfy the predicate are put after them.

What does it mean to perform a partition on a subrange? You may think that it means that the subrange contains the elements it would have if the whole collection was partitioned. But it’s not that. Rather, partition_subrange gathers the elements that would have been in the subrange if the whole collection was sorted, but without sorting the elements inside of the subrange. So it partitions the collections into 3 consecutive parts:

  • the elements that are smaller than those of the subrange,
  • the elements that would populate the subrange if the whole collection was sorted (but in any relative order),
  • the elements that are not smaller than those of the subrange.

partition_subrange Like sort_subrange, it takes 4 iterators: two to indicate the whole range, and two to indicate the subrange inside of the whole range:

#include <iostream>
#include <vector>
#include <boost/algorithm/sort_subrange.hpp>

int main()
{
    std::vector<int> numbers = { 4, 1, 8, 7, 0, 5, 3, 6, 2 };
    
    boost::algorithm::partition_subrange(begin(numbers), end(numbers),
                                         begin(numbers) + 2, begin(numbers) + 7);

    for (auto const& number : numbers) std::cout <<  number << ' ';
}

The above code outputs:

1 0 2 5 4 3 6 7 8

The subrange contains 2 5 4 3 6, which are the elements (given in any order) that would populate the subrange if the whole range was sorted. Can you guess the implementation of partition_subrange? It’s in the same spirit as the one of sort_subrange. Why don’t you take a moment and think about how you would have implemented it? Done yet? Here it is:

template<typename Iterator, typename Pred> 
void partition_subrange (
  Iterator first,     Iterator last, 
  Iterator sub_first, Iterator sub_last,
  Pred p)
{
  if (sub_first != first) {
  	(void) std::nth_element(first, sub_first, last, p);
  	++sub_first;
  	}
  
  if (sub_last != last)
  	(void) std::nth_element(sub_first, sub_last, last, p);
}

boost::algorithm::partition_subrange doesn’t have its own header. It is also located in boost/algorithm/sort_subrange.hpp.

is_partitioned_until

The STL has several algorithms of the form is_something_until, such as std::is_sorted_until and std::is_heap_until. They return an iterator that indicates the first position at which a certain property is no longer satisfied.

For example, is_sorted_until returns the iterator pointing to the position such that the subrange before it is sorted, and is no longer sorted if you extend it of one element. And is_heap_until returns an iterator such that the subrange before it respects the heap property (see STL algorithms on heap), and does not any more if you extend it of one element.

Surprisingly, the STL does not have a is_partitioned_until algorithm that would return the first position where a range is no longer partitioned (even though the STL has a std::is_partitioned algorithms that checks if a whole range is partitioned). Boost offers boost::algorithm::is_partitioned_until to fill this gap: is_partitioned_until Can you guess the implementation of is_partition_until? Here it is:

template <typename InputIterator, typename UnaryPredicate>
InputIterator is_partitioned_until ( InputIterator first, InputIterator last, UnaryPredicate p )
{
//  Run through the part that satisfy the predicate
    for ( ; first != last; ++first )
        if ( !p (*first))
            break;
//  Now the part that does not satisfy the predicate
    for ( ; first != last; ++first )
        if ( p (*first))
            return first;
    return last;
}

If the predicate is “being blue” as in the above schema, the algorithms traverses the collection without stopping unless it finds a white element.

From this point on, the result is the first blue element (or the end of the collection if it comes first). boost::algorithm::is_partitioned_until is available in the header boost/algorithm/is_partitioned_until.hpp.

This is it for sorting and partitioning. If you see some algorithms in Boost related to those topics that are missing here, please drop me a comment below. Next up, the BooSTL algorithms on searching. Stay tuned!

Related articles:

  • The BooSTL algorithms – part 2
  • The BooSTL algorithms – part 3
Don't want to miss out ? Follow:   twitterlinkedinrss
Share this post!Facebooktwitterlinkedin