std::transform, a central algorithm

Published February 13, 2017 - 3 Comments

std::transform is a very useful algorithm.

Let’s see what it can do.

This post is part of the STL learning resource.

std::transform on a range

Essentially, std::transform applies a function to each element of a range:

transform

Here is its prototype:

template<typename InputIterator, typename OutputIterator, typename UnaryOperation>
OutputIterator transform(InputIterator first1, InputIterator last1,
                         OutputIterator result,
                         UnaryOperation op);

As soon as you start working with the STL the need for std::transform appears.

For example, to obtain the keys that a map contains, you can use std::transform the following way:

map<int, string> m = { {1,"foo"}, {42, "bar"}, {7, "baz"} };
vector<int> keys;
std::transform(m.begin(), m.end(), std::back_inserter(keys), getFirst);

where getFirst is a (non-standard) function that takes a pair and returns its first element. And std::back_inserter used above is an output iterator that does a push_back into the container it is passed, every time it is assigned to. This relieves the programmer from the sizing of the output.

The concept of std::transform is so useful that there is a name for it, coming from functional programming: map (unrelated to std::map). In fact, we can see it the other way round: the STL takes its root in functional programming, so it is only normal that a central concept in functonal programming gets a central role in the STL.

std::transform on two ranges

std::transform has a second overload that takes (in essence) 2 ranges and applies a function that takes 2 parameters, on each couple of elements taken from the input ranges:

transform

Here is its prototype:

template<typename InputIterator1, typename InputIterator2, typename OutputIterator, typename BinaryOperation>
OutputIterator transform(InputIterator1 first1, InputIterator1 last1,
                         InputIterator2 first2,
                         OutputIterator result,
                         BinaryOperation op);

However ones needs to be careful when using this overload, because the second range needs to be at least as long as the first one.

Indeed, as shown in the picture and the prototype, std::transform traverses the first range completely, and reads counterparts from the second range. But it has no way to know where the second range actually stops. This overload uses what is called “1.5-Ranges” because the first range is fully provided but the second one misses the end part (for more about 1.5-Ranges see Stephan Lavavej talk STL Features And Implementation Techniques).

For a simple example, here is how to add two ranges of ints by summing together their respective elements:

vector<int> numbers1 = {1, 5, 42, 7, 8};
vector<int> numbers2 = {10, 7, 4, 2, 2};
vector<int> results;
std::transform(numbers1.begin(), numbers1.end(),
               numbers2.begin(),
               std::back_inserter(results),
               [](int i, int j) {return i+j;});

The concept of applying a function on 2 ranges also has a name coming from functional programming: zip.

std::transform in place

The output range can be any of the 2 input ranges. In that case the range is transformed “in place”.

How is std::transform in place on one range different from std::for_each? Indeed, both apply a function on each element.

There are actually 2 main differences, one is technical and relatively not important in practice, and the other one is more important:

the not important, technical one: from a standard point of view, for_each offers more guarantees than transform, namely:
- the range is traversed in order from the first element to the last,
- the function (or function object) is not copied during the traversal.

As a consequence, you could theoretically control state in your function object with for_each. But in general you don’t really want state in your functors anyway.

the important one: for_each and transform just don’t do the same thing on a given element:
- for_each applies a function on the element,
- transform applies a function on the element, and assigns the result back to the element.

So there are things for which for_each is more appropriate. For example, for_each should be preferred for having side effects in a more general sense (IO output, logging, etc.), because transform just says that… it transforms your elements.

“transform_if”?

I’ve seen quite a few people starting to use std::transform, and who soon encountered the need to apply a transformation on a restricted part of the elements of a range. Such elements would be identified by a predicate.

So on the model of the std::copy_if algorithm, which copies only elements that satisfy a predicate, the first thing that comes to mind would be to have an algorithm called “transform_if”. But there is no such thing as transform_if in the STL, nor in Boost, nor anywhere else to my knowledge.

This in itself is a hint that maybe such an algorithm isn’t the best solution to the need expressed above. And there are indeed things that would be wrong with such a solution:

it would be a function that do two things: filtering on a predicate AND applying a function,

in which order should you pass the predicate and the function? In some cases (particularly with bool and int being implicitly convertible to one another), passing them in the wrong order would compile but not do what you intended. Although this could be arguably fixed with strong types, as shown in a dedicated post scheduled for February 21.