Sorting with the STL algorithms

Published October 3, 2017 - 1 Comment

C++ offers more functionalities about sorting that meets the eye. Let’s see what the STL and Boost can do on this topic.

Sorting a whole range

The standard function to sort a whole range is std::sort. It operates in O(n*log(n)) and applies the sort directly on the passed range. The comparison used is operator< by default, or you can provide a custom comparator to sort on a different order.

std::sort does not guarantee to keep the relative order of equivalent elements. It means that several equivalent elements can end up in a different order than they were before the sort. If you need this guarantee, use std::stable_sort. It is slightly slower in algorithmic complexity: it can keep O(n*log(n)) complexity if additional memory is available, and can get up to O(n*log(n)²) otherwise. But as always with performance, it is not an issue until your profiler shows that the bottleneck of your program happens to be in std::stable_sort.

If you need to check if a range is sorted, simply use std::is_sorted.

Sorting the beginning of a range

If you don’t want or don’t need a range to be completely sorted, use std::partial_sort to sort the range up to a certain point. Here is its interface:

template< typename RandomIt >
void partial_sort(RandomIt first, RandomIt middle, RandomIt last);

middle indicates the point up to which elements are sorted after the application of partial_sort (middle not included). All elements after this position are guaranteed to be greater than those before this position, but their relative order is not specified. Its complexity is approximately (last – first) * log(middle – first).

std::partial_sort does not guarantee to keep the relative order of equivalent elements, even in the sorted part of the range. And there is no such algorithm as “partial_stable_sort”.

Note that std::partial_sort_copy does the same thing as std::partial_sort, but it outputs its results in another range, leaving the range passed to the algorithm unchanged.

If you need to check whether a range is partially sorted, use std::is_sorted_until.

Contrary to std::is_sorted that returns a bool, std::is_sorted_until returns an iterator pointing to the last element up to which the elements of the range are sorted (this element not included). Here is its interface:

template< typename ForwardIt >
ForwardIt is_sorted_until(ForwardIt first, ForwardIt last);

Sorting only one element of a range

You can choose a position in a range and put there the element that should be at this position if the range were sorted. This is done with std::nth_element:

template< typename RandomIt >
void nth_element(RandomIt first, RandomIt nth, RandomIt last);

Then all elements before the nth location are in unspecified relative order, but are smaller or equivalent to the elements after the nth location, which are also in unspecified relative order.

The complexity of std::nth_element is in O(n).

By using a combination of std::partial_sort and std::nth_element, you can sort a subrange of a collection that doesn’t start at its beginning. For this, you can apply a std::nth_element at the beginning of the subrange, and then a std::partial_sort from this point to the end of the subrange.

An additional sorting function in Boost

Boost provides another type of sorting algorithm that can be even faster than a Quicksort: spreadsort (boost::sort::spreadsort::spreadsort). Spreadsort is a combination of comparison sort and radix-based sort.

Comparison sort and Radix-based sort

Classical sorting algorithms (such as Quicksort) are based on elements comparisons (with operator< or custom equivalent). For this reason they are called comparison-based algorithms.

Radix-sort operates differently. Essentially, a radix-sort algorithm groups all elements by most significant digit. Then in each group with same most significant digit, it groups element by second most significant digit, and so on. This operates recursively and effectively sorts a collection. This type of radix-sort if called MSD, for Most Significant Digit. LSD (for Least Significant Digit) proceeds the other way around.

Boost spreadsort: a mix of the two

Radix-sort tends to be more efficient than comparison-sort for bigger collections (>=1000 elements). Spreadsort does a mix of radix sort and comparison sorts, making it a hybrid sorting algorithm. It starts with radix sorting until groups become too small for radix-sorting to be efficient and, from that point on, it performs comparison sorting within the groups.

For small collections (<1000 elements), spreadsort falls back on std::sort.

All in all, spreadsort should be more efficient than std::sort, or at least as good for smaller collections. Boost supports spreadsort for integers, floats and std::string.