std::exchange Patterns: Fast, Safe, Expressive, and Probably Underused

Published September 25, 2020

This is a guest post from Ben Deane. Ben is a lifelong learner and algorithms fan who works in the finance industry and loves to write expressive, well-performing C++. You can find him on twitter @ben_deane.

This blog post has been a long time in the making. I gave a lightning talk on std::exchange at CppCon 2017; Jonathan first asked me to write something about std::exchange in January 2019; now here we are in the strange days of the second half of 2020. But although much has changed in the outside world, I would guess that not much has changed in most C++ codebases and in the minds of many C++ programmers with respect to using std::exchange. It could still do with more publicity and more recognition of potential use cases.

You’re already using something very like `std::exchange`

I’ll start by making a perhaps surprising assertion: you’re almost certainly already using a construct that is semantically equivalent to std::exchange.

You don’t have to be using modern C++, because this construct has been in C++ since day one. You don’t even have to be using C++, because this construct was in C, and is therefore in many C-influenced languages. In fact it’s been with us for maybe 50 years or more, ever since Ken Thompson wrote the B programming language.

It’s the humble increment operator. To be precise, the postfix increment operator.

When we write i++, it’s exactly the same as writing std::exchange(i, i+1). We can see this by considering two possible implementations of strcpy:

char *idiomatic_strcpy(char* dest, const char* src) {
    while ((*dest++ = *src++));
    return dest;
}

char *exchange_strcpy(char* dest, const char* src) {
    for (;;) {
        auto s = std::exchange(src, src+1); 
        auto d = std::exchange(dest, dest+1);
        *d = *s;
        if (*d == 0) break;
    }
    return dest;
}

(code on godbolt here)

And they optimise to exactly the same assembly output [1].

There is even a vestigial clue in C++ that postfix increment is the same as std::exchange: postfix operator++ takes a dummy int argument. This differentiates it from its prefix counterpart, but is it just a coincidence?

struct S {
    constexpr auto& operator++() { 
        ++i; 
        return *this;
    }
    constexpr auto operator++(int) { 
        auto ret = *this; 
        ++i; 
        return ret; 
    }
    int i{};
};

int main() {
    S s{};
    ++s;
    s++;
    return s.i;
}

We could actually take this further by using the “dummy” argument, and end up with something that’s almost exactly like std::exchange.

struct S {
    constexpr auto operator++(int incr) { 
        auto ret = *this; 
        i = incr;
        return ret; 
    }
    int i{};
};

int main() {
    S s{};
    s.operator++(17);
    return s.i;
}

I don’t particularly recommend abusing the conventional usage of operator++ like this, but it serves to illustrate the point [2].

Although postfix increment may not be nearly as widespread in a typical codebase as prefix increment, we usually have no problems using it or reasoning about its use where it leads to concise, readable code [3]. And so it should be with std::exchange.

The “swap-and-iterate” pattern

I’ve found extensive use for std::exchange wherever I would previously use the “swap-and-iterate” pattern. This pattern occurs a lot in event-driven architectures; one might typically have a vector of events to dispatch or, equivalently, callbacks to invoke. But we want event handlers to be able to produce events of their own for deferred dispatch.

class Dispatcher {
    // We hold some vector of callables that represents
    // events to dispatch or actions to take
    using Callback = /* some callable */;
    std::vector<Callback> callbacks_;

    // Anyone can register an event to be dispatched later
    void defer_event(const Callback& cb) {
        callbacks_.push_back(cb);
    }

    // All events are dispatched when we call process
    void process() {
        std::vector<Callback> tmp{};
        using std::swap; // the "std::swap" two-step
        swap(tmp, callbacks_);
        for (const auto& callback : tmp) {
            std::invoke(callback);
        }
    }
};

This is the “swap-and-iterate” pattern. It is safe for the callbacks to call defer_event and therefore produce events of their own: we use tmp so that a call to defer_event does not invalidate the iterator in our loop.

But we’re doing a bit more work here than necessary, and we’re also guilty of incurring the “ITM antipattern” [4]. First, we construct an empty vector (tmp), then — with swap — we have 3 move assignments before we get to the business of iterating.

Refactoring with std::exchange solves these problems:

class Dispatcher {
    // ...

    // All events are dispatched when we call process
    void process() {
        for (const auto& callback : std::exchange(callbacks_, {}) {
            std::invoke(callback);
        }
    }
};

Now we don’t have to declare a temporary. Inside std::exchange we have one move construction and one move assignment, saving one move compared to swap. We don’t need to understand the ADL dance involved in the “std::swap two-step” [5]. We didn’t need tmp — just a way to express the empty vector, which here is {}. And the compiler is really good at optimising the call to std::exchange, so of course we get the copy elision we would normally expect. As a result, the code overall is more concise, faster, and provides the same safety as before.

Posting to another thread

A similar pattern occurs in any multithreaded setting where we want to capturean object in a lambda expression and post it to another thread. std::exchange allows us to efficiently transfer ownership of an object’s “guts.”

class Dispatcher {
    // ...

    void post_event(Callback& cb) {
        Callback tmp{};
        using std::swap;
        swap(cb, tmp);
        PostToMainThread([this, cb_ = std::move(tmp)] {
            callbacks_.push_back(cb_);
        });
    }
};

Here we are taking ownership of the passed-in callback by swapping it into a temporary, then capturing that temporary in a lambda closure. We are capturing by move in an attempt to improve performance, but ultimately we’re still doing a lot more than we need to.

class Dispatcher {
    // ...

    void post_event(Callback& cb) {
        PostToMainThread([this, cb_ = std::exchange(cb, {})] {
            callbacks_.push_back(cb_);
        });
    }
};

This gets us exactly what we want — again with more expressive code — and we’re asking the processor to do less. Once more, std::exchange uses one move fewer then std::swap, and copy elision, a.k.a. the return value optimization, constructs the return value directly into the lambda expression’s closure.

Why not just move?

But, I hear you ask, why do more than one move at all? Why not something like this?

class Dispatcher {
    // ...

    void post_event(Callback& cb) {
        PostToMainThread([this, cb_ = std::move(cb)] {
            callbacks_.push_back(cb_);
        });
    }
};

The answer is to ensure future maintainability and flexibility. It may well be true that a moved-from Callback is considered just as empty as if we had explicitly emptied it with std::exchange, but is that obvious? Is it always going to be true? Will we ever need to update that assumption — or this code — if we change the type of Callback later on?

In the major STL implementations, it is currently the case that a moved-from container is empty. More specifically, sequenced containers like std::vector; associative containers like std::unordered_map; and other “containers” like std::string or std::function are empty after move, even when they are small-buffer-optimised [6].

But this is not necessarily true of every single container type we might use. There is no particular reason why a homegrown small-buffer-optimised vector should be empty after we move from it. We find a notable standard counterexample of the “normal” behaviour in std::optional, which is still engaged after being moved from. So yes, using std::move — obviously — only incurs one move, whereas std::exchange incurs two, but at the cost of abstraction leakage. Using only std::move, we need to know and be able to reason about the move-related properties of the container we use; future maintainers (usually ourselves, in 6 months’ time) also need to know about that “empty after move” constraint on the code, which is not explicitly expressed anywhere and not obvious from inspection.

For this reason, I recommend being explicit about clearing objects that are supposed to be empty, and std::exchange can do just that. In fact cppreference.com notes a primary use case for std::exchange in writing the move special member functions to leave the moved-from object cleared.

Can we use `std::exchange` with locks?

I want to go back to thinking about multithreaded code, because it may seem at first that std::exchange isn’t a great option when we need to access something under mutex protection:

class Dispatcher {
    // ...

    // All events are dispatched when we call process
    void process() {
        std::vector<Callback> tmp{};
        {
            using std::swap;
            std::scoped_lock lock{mutex_};
            swap(tmp, callbacks_);
        }
        for (const auto& callback : tmp) {
            std::invoke(callback);
        }
    }
};

Here, the vector of callbacks is protected by a mutex. We cannot afford to hold this lock while iterating, because any event handler that wants to generate an event will try to lock the mutex in order to queue its event [7].

So we can’t use our std::exchange pattern naively:

class Dispatcher {
    // ...

    // All events are dispatched when we call process
    void process() {
        std::scoped_lock lock{mutex_};
        for (const auto& callback : std::exchange(callbacks_, {})) {
            std::invoke(callback);
        }
    }
};

because that would break our ability to queue events from callbacks. The solution, as is so often the case, is to use a function. In this instance, an immediately-invoked lambda expression fits the bill nicely.

class Dispatcher {
    // ...

    // All events are dispatched when we call process
    void process() {
        const auto tmp = [&] {
            std::scoped_lock lock{mutex_};
            return std::exchange(callbacks_, {});
        }();
        for (const auto& callback : tmp) {
            std::invoke(callback);
        }
    }
};

We reap the benefits of holding the lock for as short a time as possible; taking advantage of return value optimization; saving a move; and concision of expression.

Were I being deliberately provocative — such as in a lightning talk — I might also suggest the following:

class Dispatcher {
    // ...

    // All events are dispatched when we call process
    void process() {
        const auto tmp = (std::scoped_lock{mutex_}, std::exchange(callbacks_, {}));
        for (const auto& callback : tmp) {
            std::invoke(callback);
        }
    }
};

Here, the scoped_lock lives until the semicolon, and the result of the comma operator is the result of std::exchange, used to construct tmp. I concede that many people would recoil in horror at this use of the comma operator, but that’s a topic for another article [8].

Consider `std::exchange` over `std::swap`

To sum everything up, I believe that std::exchange is still underused, and situations where it can be usefully applied are probably under-recognised. Whenever you find yourself writing swap, consider: do you really need that temporary?

Footnotes

[1]: Yes, I know in real life, strcpy unfortunately returns a copy of the dest passed in. It would be more useful — as I wrote here — to return where dest ends up. I also know that strcpy is unsafe, but I’m using it as an example.

[2]: I do, however, recommend marking the postfix increment operator [[nodiscard]]. To my knowledge there is no way to get a warning on any compiler for throwing away the result of a builtin operator++.

[3]: Most modern style advice prefers prefix increment, using postfix increment only where necessary — which is to say, exactly where we need its “return value,” as we sometimes do.

[4]: Conor Hoekstra expounds on the “ITM” (initialize-then-modify) antipattern in his recent MUC++ talk.

[5]: The “std::swap two-step” is explained by Arthur O’Dwyer here.

[6]: There are well-thought-out reasons for this. It’s not as simple as “not clearing a small-buffer-optimised std::string must be cheaper than clearing it”. Ask your local standard library implementer for details.

[7]: We could use a recursive_mutex to handle locking re-entrancy, but I try to avoid such lazy-thinking solutions. They usually lead to an erosion of code reason-ability.

[8]: This construction may also fall foul of the [[nodiscard]] attribute which can usefully be applied to lock objects, precisely to prevent immediate unlocking of accidentally-unnamed locks.